WORLD VIEW How Hollywood 
has got it right when it comes 
to stammering p.7 


EDITORIALS 


WASPS Beat of adrum EGYPT Citizens act to 
turns queens to protect Cairo antiquities 
workers p.9 museum p.10 


Earth 2.0 


The hunt is on for a distant planet similar to our own. Astronomers should decide just how similar it 


needs to be, before the candidates start pouring in. 


science fiction. What rich and varied life could it contain? What 

would such a discovery mean for humanity’s own place in the 
Universe? How many similar planets are out there? The question is 
more than a philosophical puzzle, and it comes with a hard scientific 
edge that should be considered sooner rather than later. As the search 
for planets beyond the Solar System widens and public interest in the 
quest grows, at which point should astronomers declare the hunt for 
another Earth a success? 

Hundreds of candidate planets have been identified, and some have 
been profiled, if not as a second Earth, then as signs that the search 
is heading in the right direction. Last month, NASA announced the 
discovery of the smallest extrasolar planet yet: Kepler-10b, which has 
1.4 times the diameter and 4.6 times the mass of Earth, and was discov- 
ered by NASAs Kepler spacecraft (see page 24). Although the planet 
orbits too close to its star to support life, the news was heralded by some 
media outlets as a landmark in the search for a new Earth, particularly 
because Kepler-10b is the first exoplanet with a dense and rocky core. 

Attention on Kepler’s mission will intensify again this week, as 
NASA publicly releases a batch of its data (see page 53). The satellite 
focuses on a single point in the sky, where it can keep track of some 
150,000 stars. Kepler observes the decrease in the brightness of these 
stars as planets pass in front of, or ‘transit, them, and the findings are 
used to target telescopes on the ground. 

It takes three to four such confirmed transits before astronomers 
are confident that they have found a planet, which makes it too soon 
to be sure whether Kepler has found a world truly similar to Earth. 
(By definition, Earth-like planets orbiting a star similar to the Sun 
pass in front of their stars about once a year, and Kepler has only been 
in place for about 18 months). All exoplanets confirmed to date orbit 
much closer to their stars than does Earth; they are too close for con- 
ditions to allow the existence of liquid water, which is what defines a 
star’s ‘habitable zone. 

As more data are analysed, they will probably produce a string of 
reports of ever-smaller planets, until we get an Earth-sized example. 
Many of these small planets are likely to orbit M-dwarfs, by far the 
most numerous type of star in the Universe (see page 27). The habit- 
able zone around these stars is very narrow, but Kepler may find a 
rocky planet there. Would that be the first Earth-like planet? Probably 
not if, as seems likely, it were to be tidally locked, so that one side faced 
permanently towards the star. 

What about planets that orbit larger stars? Does a first Earth-like 
planet have to orbit in the habitable zone of a G2-type star, similar 
to the Sun? If so, must the planet be Earth-sized? And is the focus 
ona habitable zone defined in terms of liquid water appropriate? As 
the Universe reveals its secrets, we discover it to be a more diverse 
and stranger place than we had anticipated. Would it be so odd to 
conceive of life on a dry or frozen world? Must the first Earth-like 


Ts search for a second Earth has long enthralled readers of 


planet be capable of supporting life, or human life in particular? 
The answers to these questions are important because the public- 
relations rewards of planet-hunting — and planet-finding — are 
great. The temptation to hype each discovery is equally large, but so 
is the scope for confusion and public scorn, 


“Asmore dataare _ especially given the rabid response on some 
analysed, they blogs to NASA announcements. Set the bar 
will produce a for ‘Earth-like’ planets too low, anda string of 
string of reports repeated discoveries could be overwhelming. 


Set the bar too high, and a planet that meets 
the strict criteria may not emerge at all. If that 
were to happen, the Kepler mission would risk being viewed as a fail- 
ure — which it most certainly is not. 

Amid the excitement of exploring a new frontier, astronomers should 
pause to consider the public reaction to their work. Then they should 
decide how a standard should be set. Perhaps a reasonable starting 
point would be to define an Earth-like planet as one of similar size to 
Earth, orbiting in the habitable zone of any star, and not tidally locked. 
More important than the details of the definition is that the relevant 
criteria are established before the claims start to pile up. To announce 
the discovery of the first Earth-like planet would be a stunning success. 
To announce it more than once could look like carelessness. m 


of planets.” 


Preserve the past 


Historic scientific collections deserve better 
than to gather dust. 


hen the celebrated anatomist Antonio Scarpa died in 1832, 
Wi left an extensive collection of anatomical preparations 

to his university in Pavia, Italy. The collection includes his 
own head which, pickled, now presides grimly over his legacy in a 
protected museum. 

Across Europe, a distressingly high number of historic scientific 
collections — from herbaria to minerals — are being lost or left to rot 
in universities. As many are autonomous institutions, they can’t be 
told what to do by governments, they are mostly poorly funded and 
they tend to be oriented to the future, not the past. Historic collections 
have to compete for space and resources with active researchers, and 
are rarely prioritized. 

Germany may have come up with a way to break out of this dilemma. 
Earlier this week, the Wissenschaftsrat, the nation’s influential science 
council, issued a detailed list of recommendations that declares that 
scientific collections of potential research value should be handled as 


3 FEBRUARY 2011 | VOL 470 | NATURE | 5 


© 2011 Macmillan Publishers Limited. All rights reserved 


| THIS WEEK | EDITORIALS 


research infrastructures. It says that universities have a duty to pre- 
serve collections that it describes as being ‘in critical condition’ and 
make them available to internal and external researchers — and to 
integrate them as appropriate into teaching programmes. 

The council also details how this should be done. Universities, it 
says, together with Germany’s research museums and the country’s 
main granting agency, the DFG, should develop criteria to assess the 
scientific merit of a collection. These criteria should then be applied 
ina hard-nosed fashion so that inferior collections are closed or trans- 
ferred elsewhere. Furthermore, historic scientific collections in uni- 
versities should be allocated the space they need, including a room for 
researchers to work on them and for exhibitions. 

This prioritization is important. Historic objects frequently turn out 
to have great — often unexpected — value for cutting-edge research. 
Well-preserved old bones, for example, are a treasure trove for modern 
palaeontologists wielding new DNA-based analytical technologies. 
Old herbaria can similarly feed the curiosity of today’s plant geneticists. 
Historic collections can also be unique resources for social scientists, 
particularly science historians. So the Wissenschaftsrat’s endorsement 
of their fundamental value is extremely welcome. 

But will the recommendations be taken up? Most probably yes. The 
Wissenschaftsrat has serious clout because it comprises representa- 
tives from both the federal and state governments, as well as scientists. 
Its procedures are systematic and its analyses are thorough. It makes 
no recommendations that its members know they will not be able to 
pay for. It could, however, take some time for the recommendations 
to be implemented. 

In the case of scientific collections, the Wissenschaftsrat proposes 
that the federal government issues a call for proposals for a five-year 
project to coordinate the efforts of universities to save their collections. 


Scientists at German universities — which are funded by state gov- 
ernments, and constitutionally banned from receiving infrastructure 
funding from the federal government — should find short-term grants 
from research agencies or foundations to upgrade, restore and make 
their collections available. The universities themselves should then 

provide overheads for ensuring that the col- 


“The initiative lections are looked after in the long-term, a 
is winning sum that the Wissenschaftsrat says should be 
admirationfor modest. 

its long- term These recommendations did not emerge 
vision.” from a vacuum. In 2004, the DFG supported 


a five-year project to identify and catalogue 
collections in German universities. It identified more than a thousand, 
of which nearly 300 were shown to have been lost or destroyed. Her- 
baria that were once state-of-the-art, for example, were confined to 
dusty cellars or stuffy attics when classical botanics fell out of research 
fashion. 

This DFG programme won the admiration of scientists in many 
countries that still have no national catalogue of the treasures hidden 
in their own universities and no systematic way of preparing one. The 
Wissenschaftsrat’s initiative is now winning admiration for its long- 
term vision and political commitment. 

There will certainly be battles to come. Simply saying that space 
should be made available for collections ism’t helpful if there is genu- 
inely no space to be had, for example. But the value that the Wissen- 
schaftsrat now places on collections should make such battles easier 
to win. Research organizations in other countries should look to see 
if they could follow its lead. A collection deemed scientifically valu- 
able doesn't need to be as peculiar as Scarpa’s head to make it worth 
preserving, but it needs the same protections and accessibility. = 


Tough on truth 


The Global Fund should be praised for coming 
clean about fraud by grant recipients. 


article published last month by the Associated Press, which 

alleged that: “A $21.7 billion development fund backed by 
celebrities and hailed as an alternative to the bureaucracy of the United 
Nations sees as much as two-thirds of some grants eaten up by corrup- 
tion, The Associated Press has learned.” 

Journalistic scrutiny of aid is welcome and revelations of widespread 
and large-scale fraud by recipients of grants from the Global Fund to 
Fight AIDS, Tuberculosis and Malaria would be a big deal. The fund, 
created by the highly industrialized countries of the G8 forum in 2002, 
now accounts for one-quarter of all international financing to fight 
AIDS, two-thirds of that for tuberculosis, and three-quarters of that 
for malaria. But despite using the phrase “has learned” — journalist 
shorthand for a scoop — the Associated Press (AP) article’s central 
claims contained no new revelations. The frauds mentioned — involv- 
ing grants to Mali, Mauritania, Djibouti and Zambia — had already 
been made public by the fund itself. 

The sums involved in the reported fraud cases amount to US$39 
million, of $13 billion that the fund has disbursed, but other fraud 
cases have no doubt so far gone undetected. Although any corruption 
is too much, to keep it down to these levels would be an achievement, 
given the realities of putting large amounts of money into any country 
or project, not least those where corruption can be rife. 

Nonetheless, Sweden, Germany and Ireland have responded with sug- 
gestions they may suspend their pledges to the fund for the period cov- 
ering 2011-13. As fund members they are well aware of how it handles 


cc Fees plagues global health fund,’ screamed the title of an 
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corruption, so their response is probably partly a reaction to the wide 
publicity that the AP article received in the international media, and the 
sensationalist and exaggerated claims about the scale of the problem 
— no government, accountable as it is to taxpayers, wants to be seen 
as lax on corruption. As Nature went to press, reports suggested that 
funding from these countries would be restored, while Sweden has also 
since said that it is happy with the way the Global Fund is dealing with 
the problem. 

The reputation of the fund — which by its own estimates saved more 
than 4.9 million lives by 2009 — has been unfairly tarnished, and its 
fund-raising efforts perhaps hampered at a time when the economic cri- 
sis is already making donors reconsider the size of their contributions. 

When it comes to being transparent over problems of corruption in 
recipient countries the Global Fund has been far better than most aid 
donors or agencies. It has openly tackled corruption — with a ‘zero 
tolerance’ policy, suspending grants at the first whiff of wrong-doing, 
and working with recipient countries to bring fraudsters to justice 
and recover what misdirected money it can. Could it do more? Yes: 
for example, by strengthening oversight further. But it is already well 
down the road to effectively tackling corruption. 

The same cannot be said for many of the alphabet-soup of aid agen- 
cies, which choose not to publicise their own uncovered fraud cases, 
perhaps out of fear of damaging their image, and losing donors. 
Several observers have been quick to point out that if the AP article 
has an upside, it is to have drawn renewed attention to fraudulent use 
of funds by such agencies. The fight against aid corruption has gener- 
ally improved markedly since the 1990s, but many agencies still fall far 
below the high bar set by the Global Fund. Meanwhile, astonishingly, 
the fund’s own fraud investigations have been hampered because the 
United Nations Development Programme, which 
manages some of its grants, has refused to allow 
the fund access to its records. Scrutiny should be 
welcomed, but honesty should not carry so high 
a price. m 
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WORLD VIEW jecninicor sen 


To recently released film The King’s Speech looks set to sweep 


the board at the Oscars later this month, and actor Colin Firth 

has been widely praised for his depiction of King George VI as 
he seeks to overcome a debilitating stammer. Asa scientist who studies 
stammering (or stuttering as it is also called), I liked the film. Many 
people still poke fun at those who stammer, and find their difficulties 
amusing. Perhaps the film’s success, with raised public awareness of 
the condition, will help to address such prejudice. 

We cannot be sure of the accuracy of the historical events the film 
portrays, although many of the scenes seemed to me to ring true. In 
many ways, we understand a lot more than we did in George VI’s day. 
In others, we are just as blind. Although 1 in 20 children stammer, 
most grow out of it. Of the 1 in 100 people who still stammer as they 
enter their teenage years — including, it seems, 
George VI — few go on to recover. For them, it 
becomes a question of how to manage the condi- 
tion. This is an important lesson from the film: 
we should not give false hope to the parents of 
children who are unlikely to recover from their 
stammer. 

Techniques to control stammer in adults 
can be effective, as the film shows. But without 
spoiling it too much for those who have not 
seen it (and you should!), it is fair to say that get- 
ting people who stammer to repeatedly shout 
rude words would no longer be suggested as a 
technique, if it ever was. 

Those who stammer, however, do often find 
they can speak fluently when they are prevented 
from hearing their own voice. The film illustrates 
this when George VI is shown speaking fluently 
when music is played. Playing their speech back to people in real time, 
but delayed or shifted in frequency, can also help. We still do not fully 
understand why this happens, but prosthetic devices are available that 
produce this effect and bring about improvement. The most discreet 
of these cost up to US$5,000, although opinion in my community is 
divided as to whether such technology is appropriate, because the stam- 
mering returns when the device is switched off. 

More controversially, some researchers believe they can induce fluency 
in people, children in particular, using verbal operant procedures, 
similar to the reward and punishment techniques used to train dogs 
and other animals. The most common of these is the Lidcombe Pro- 
gram, developed by researchers at the Australian Stuttering Research 
Centre at the University of Sydney. Although some data suggest that 
the technique works, the numbers are too low to 


draw firm conclusions. A major sticking point NATURE.COM 
is that there is no commonly agreed method _ Discuss this article 
to diagnose the disorder in childhood. The _ onlineat: 
Australian group considers children who _ go.nature.com/nad2nl 
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ENDS. 


) Listen to the lessons of 
The King’s Speech 


A film that shows King George VI struggling with a stammer could raise 
awareness and change treatments, says Peter Howell. 


commonly repeat whole words as stammerers. In my view, this is 
wrong and could skew the results. 

Much of the controversy over treatments such as the Lidcombe 
Program and prosthetic devices centres on whether the benefits 
linger when the treatment ends. Such carry-over would demand that 
stammering is at its heart a learned behaviour that can be undone. I 
do not think it is. 

So, what causes people to stammer? And how can we distinguish 
between children who will recover from their stammer, and those who 
will not. These are questions of great importance for both sufferer 
and society. Comparison of people who do or do not recover sug- 
gests that several factors are important. Biological (genetics and brain 
differences), linguistic and motor factors, and type of stammering 
symptom are reliably reported to differ between 
such groups. 

Much attention in the media has focused on 
a ‘stuttering gene after the discovery of genetic 
mutations in members of a consanguineous 
(inter-marrying) Pakistani family. The gene 
identified codes for proteins involved in cellu- 
lar lysosome function, which removes damaged 
molecules and viruses. Other geneticists, includ- 
ing Simon Fisher, director of the Max Planck 
Institute for Psycholinguistics in Nijmegen, the 
Netherlands, point out that the lysosome func- 
tion is broad and more research is needed on 
how it could affect the central nervous system of 
people who stammer. 

A study on a Chinese family suggests a more 
plausible genetic basis, as it reported mutations 
in genes that seem to affect parts of the brain (the 
basal ganglia) previously implicated in the disorder. 

My work has shown that the severity of stammering symptoms in 
eight-year-old children can be used to predict whether the children 
are likely to still stammer as teenagers and adults. Wider discussion is 
needed about how this information should be used by therapists — for 
example, whether to intervene with treatment. As most people will 
recover in time, watchful monitoring of children for signs that their 
stammering is worsening may be a better approach. 

If Colin Firth’s performance is a fair reflection of George VI’s speech, 
then the types of stammer he produced — prolonged consonants and 
repetition of the first part of words — would, asa child, have suggested 
he had little chance of recovery. We should make more effort to pass 
this information on to parents. Expectations must be realistic, but, as 
The King’s Speech shows, they can still be optimistic. m 


Peter Howell is professor of experimental psychology at University 
College London and author of Recovery from Stuttering. 
e-mail: p.howell@ucl.ac.uk 
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Selections from the 
scientific literature 


RESEARCH HIGHLIGHTS 


| CANCER 
Weakening 
tumour defences 


Blocking just one enzyme 
could make some tumours 
easier to kill with radiation 
therapy. 

Fei-Fei Liu at the University 
of Toronto in Ontario, Canada, 
and her team found that head 
and neck cancer cells treated 
with small RNA molecules that 
silence a gene called UROD 
are unusually susceptible to 
ionizing radiation. UROD 
encodes an enzyme involved 
in producing iron-containing 
haem molecules, and reducing 
its levels caused oxidative 
damage and cell death. When 
implanted into mice that were 
then treated with radiation, 
tumour cells with lowered 
levels of the enzyme grew more 
slowly than cells containing 
normal levels. 

Furthermore, head and neck 
tumour samples expressing the 
lowest levels of UROD tended 
to come from patients who 
responded well to treatment. 
Drugs that inhibit UROD could 
make radiation treatments 
more effective, the authors say. 
Sci. Transl. Med. 3,67ra7 (2011) 


Cow spills guts 
for biofuels 


A dearth of ways to efficiently 
digest the plant fibre cellulose 
has stymied efforts to develop 
plant-based biofuels. Genomic 


MATERIALS 


‘Soft’ robot has deft touch 


Robots made from hard materials are not 
well equipped to handle fragile objects, so 
researchers have created prototypical ‘soft’ 
robots from elastic polymers. These can 
perform delicate tasks such as picking up 


eggs. 


George Whitesides and his colleagues 
at Harvard University in Cambridge, 
Massachusetts, embedded balloon-like 
channels in moulded silicones to create 


sequencing of bacteria from 
the cow’s digestive system has 
turned up enzymes that might 
address this problem, reports 
a team led by Edward Rubin 
of the Lawrence Berkeley 
National Laboratory in 
Berkeley, California. 

The researchers sequenced 
268 gigabases of DNA from 
microbes that adhered to plant 
material in the rumen of the 
cow, a champion cellulose 

digester. The microbes 
were collected 
directly from the cow 

rumen through a 

port (pictured). The 

researchers identified 

27,755 genes 

possibly involved in 

carbohydrate digestion, 
and produced protein 
from 90 that had 
similarities to known 
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materials that change shape in specific ways 
when air is pumped into the channels. This can 
generate complex motion from a single source 
of pressure, leading to designs such as the team’s 
starfish-shaped gripper (pictured). 


With appropriate materials, this technology 


(2011) 


enzymes with particular 
attributes. More than half 
demonstrated enzymatic 
activity against a panel of 10 
cellulose-containing plants. 
Science 331, 463-467 (2011) 


| ___NEUROSCIENCE 
Omega-3s affect 
brain signalling 


A diet low in omega-3 fatty 
acids — typically found in fish 
oils — has been associated 
with mood disorders. To find 
a molecular link, Sophie Layé 
at the University of Bordeaux 
in France, Olivier Manzoni at 
the French National Institute 
of Health and Medical 
Research in Marseilles and 
their co-workers fed mice a diet 
that was either high or low in 
omega-3s. They then looked 


© 2011 Macmillan Publishers Limited. All rights reserved 


might eventually produce robots that can 
handle heavy loads or conduct electricity. 
Angew. Chem. Int. Edn doi:10.1002/anie.201006464 


at the ability of neurons in the 
prefrontal cortex, a brain region 
thought to mediate emotional 
behaviour, to alter the strength 
of their connections — a 
process known as synaptic 
plasticity. The authors focused 
on lipid signalling molecules 
called endocannabinoids and 
their receptors, which are 
involved in this process. 

They found that mice whose 
diets were low in omega-3s 
had lower levels of the fats, and 
reduced synaptic plasticity, 
in the prefrontal cortex. This 
was due to the decoupling of 
the cannabinoid receptors 
from certain proteins that 
normally bind to them. Mice 
fed the low-omega-3 diet also 
showed behavioural signs of 
depression and anxiety. 

Nature Neurosci. doi:10.1038/ 
nn.2736 (2011) 


F. ILIEVSKI 


J.L. GJERSTAD 


POPULATION BIOLOGY 


Whales found 
where whaling was 


The North Atlantic right whale 
(Eubalaena glacialis) is one of 
the world’s rarest cetaceans, 
and little is known about its 
wintering or summering 
grounds, hampering 
conservation efforts. Now, 
researchers have documented 
signs of the whale in an area 
that was a whale-hunting 
ground in the late 1800s. 

David Mellinger at Oregon 
State University in Newport 
and his colleagues carried out 
a year-long acoustic survey 
at five sites in and around the 
Cape Farewell Ground waters, 
an area about 500 kilometres 
east of southern Greenland. 
The team recorded more than 
2,000 whale communication 
calls, mainly between July and 
November, suggesting that 
the area is still an important 
summer ground for the 
creatures. 

The data will help to guide 
the relocation of shipping 
lanes and restrictions on vessel 
speed to prevent collisions 
with the animals. 

Biol. Lett. doi:10.1098/ 
rsbl.2010.1191 (2011) 


Kill one species to 
save the rest 


The loss of one species from 
an ecosystem can have 
unpredictable — and on 
occasion catastrophic — 
cascading effects. A modelling 
study suggests a strategy for 
rescuing a troubled ecosystem: 
selectively remove one or more 
additional species. 

Sagar Sahasrabudhe 
and Adilson Motter of 
Northwestern University in 
Evanston, Illinois, showed 
that removing or partially 
suppressing one or more 
species in a food web at key 
time points after one member 
has become extinct saves other 
members of the web from the 
same fate. The duo used several 
model food webs, as well as 
two webs modelled with data 


derived from real ecosystems 
— the Chesapeake Bay off 
Maryland and Virginia, and the 
Coachella Valley in Southern 
California. 

The idea — a controversial 
one that may not sit well with 
some conservationists — relies 
on the fact that ecosystem 
networks tend to shift to a 
different stable arrangement 
after losing members. 

Nature Commun. doi:10.1038/ 
ncomms1163 (2011) 

For a longer story on this 
research, see go.nature. 
com/2sckoo 


Laser travels 
forwards and back 


One way to identify poisonous 
gases, or the vapours released 
by explosives, is to detect the 
effect of these molecules on 
laser light beamed through 
them. But practical detection 
devices should send and collect 
the laser beam from the same 
side of the gas cloud. Arthur 
Dogariu and his colleagues at 
Princeton University in New 
Jersey have taken a step towards 
this goal by demonstrating 
backwards lasing in air. 

They used an ultraviolet 
‘pump laser to break up oxygen 
molecules. The same laser 
then excited the molecular 
fragments into generating an 
infrared beam. The region in 
the air that the pump laser was 
focused on was about 100 times 
longer than it was wide. So half 
of the infrared light was emitted 
forwards, and the other half 
travelled backwards towards 
the source. The returning beam 
carried fingerprints of other 
molecules in the air. 

Science 331, 442-445 (2011) 


Drummed into 
submission 


Paper wasps divide the 

work of the colony between 
different castes: workers 
build and defend the nest, 
whereas individuals destined 
to become queens lay eggs. 
Wasps do not inherit these 
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COMMUNITY 


CHOICE 


Lights on for drug delivery 


A biocompatible gel that sheds its load 
when exposed to ultraviolet light might be 
used for controlled delivery of drugs and 
other molecules inside the body. 

Weihong Tan at the University of Florida in Gainesville, 
Xiaoling Zhang at the Beijing Institute of Technology and 
their colleagues created this gel by first dissolving polymers 
decorated with DNA strands in water. They added DNA 
fragments that bind to the polymer-bound DNA, crosslinking 
the polymers to form a gel. These fragments also carry 
light-sensitive azobenzene molecules that, if hit by ultraviolet 
light, cause the crosslinks to break, releasing any trapped 
molecules. The gel unloaded a variety of test cargoes, including 
nanoparticles, an enzyme and the cancer drug doxorubicin, ina 
matter of minutes. Visible light restores the hydrogel’s structure. 


> HIGHLY READ 
on pubs.acs.org 


in Dec 2010 


Langmuir 27, 399-408 (2010) 


roles, but are instead 
set ona particular 
developmental path 
during the larval 
stage. Researchers 
have discovered 
how this occurs in 
the genus Polistes: 
queens use their 
antennae to drum 
near to or on nest cells 
containing larvae to 
turn them into workers. 
Sainath Suryanarayanan 
at the University of 
Wisconsin-Madison and his 
group used an electrical device 
to simulate this drumming on 
colonies that produce Polistes 
fuscatus wasps (pictured) 
destined to become queens. 
The wasps that emerged 
had the lean body type of 
workers. The link between 
the drumming — which for 
larger Polistes species is audible 
outside the nest — and gene- 
expression changes is not clear. 
Curr. Biol. doi:10.1016/j. 
cub.2011.01.003 (2011) 


| NEUROSCIENCE 
Root of resilience 
under stress 


Some individuals react coolly 
to stressful events, whereas 
others slip into depression. 


Work in mice suggests 
that chemical 
modifications 
to the DNA 
may explain the 
difference. 
Shusaku Uchida and 
Yoshifumi Watanabe at 
Yamaguchi University 
in Japan and their 
colleagues subjected two 
genetically distinct 
strains of mice to 
chronic stress and 
then measured various 
proteins involved in 
neuronal growth and 
maintenance. The strain 
known to succumb to stress 
had lower than normal levels of 
a protein called GDNF in the 
brains striatum. The resilient 
strain had higher amounts. 
The team found that 
histones — proteins that 
package up DNA and regulate 
gene transcription — ona 
section of the Gdnf gene were 
modified differently between 
the two strains. This led to Gdnf 
repression in the susceptible 
mice and increased expression 
in the more resilient strain. 
Neuron 69, 359-372 (2011) 
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E. RESCHKE/PHOTOLIBRARY 


SEVEN DAYS nescnnss 


GM alfalfa ruling 


A long-running battle over 
genetically modified (GM) 
alfalfa has ended with the 

US Department of Agriculture 
(USDA) deciding that farmers 
can plant the crop without 
restriction. Just a month 

ago, the USDA put forth a 
draft plan that would have 
limited where the herbicide- 
resistant crop could be 
planted, addressing organic 
farmers’ concerns about 
contamination of their fields. 
But the plan came under fire 
from politicians and lobbyists, 
and on 27 January, the USDA 
reversed its stance. 


Polio pledges 
Billionaire Bill Gates has 
called for a final push to 
eradicate polio. The Bill & 
Melinda Gates Foundation 
will add US$102 million 

to its annual $200-million 
pledge for the cause, Gates 
announced on 28 January. 
The United Arab Emirates 
and the United Kingdom also 
made funding commitments 
last week. The World Health 
Organization's Global Polio 


SOUND BITE 
661t’slikea 
math teacher 
not believing 
in algebra. 99 


William Wallace, Washington 
DC representative of the 
National Association of 
Biology Teachers in McLean, 
Virginia, responds toa 

study suggesting that most 
biology teachers in publicly 
funded high schools are 
uncomfortable with teaching 
evolution. See go.nature. 
com/yfgex8 for more. 


Cairo museum survives Egyptian looters 


Amid the public unrest in Egypt last week, 
Cairos world-famous Museum of Egyptian 
Antiquities seems to have been spared serious 
damage. Looters tried to steal two mummy 
skulls and damaged some 100 items at the 
museum on 28 January. But other citizens 
stepped in to guard the museum, which holds 
precious antiquities including Tutankhamun’s 


Eradication Initiative, which 
vaccinates children against the 
virus, needs $720 million to 
fill a funding gap in 2011-12. 
Polio remains endemic in 
Afghanistan, Pakistan, India 
and Nigeria. 


US budget tension 


In his annual State of the 
Union address on 25 January, 
US President Barack Obama 
stressed the need to invest 

in research — particularly 

in biomedical science and 
clean-energy technology — 
but he also proposed a freeze 
on annual domestic spending 
for the next five years. 
Obama's 2012 budget request 
is expected in mid-February. 
Congress has yet to vote on 
the budget for 2011, which 

is expected to include tough 
austerity measures. 
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Weak innovation 
Member states in the European 
Union are losing their lead in 
innovation over Brazil and 
China, and are not catching 

up with the United States 

and Japan, according to the 
European Commission's 
annual Innovation Union 
Scoreboard, released on 

1 February. The report, based 
mainly on data from 2008 and 
2009, says that Europe's patent 
revenues and business research 
spending are particularly poor 
compared with those of Japan. 


China’s wind win 
China has now installed more 
wind-power capacity than 
has been deployed by the 
United States. The American 
Wind Energy Association, in 
a 24 January report, said that 
its wind industry installed 


death mask, before the army secured the 
building the next day (pictured). Zahi Hawass, 
head of Egypt’s Supreme Council of Antiquities 
in Cairo, says that the damaged items can be 
restored, adding that the looters mainly stole 
jewellery from the gift shop. Archaeological 
sites outside the city may not have been not so 
lucky, reports suggested as Nature went to press. 


5.1 gigawatts of capacity last 
year — half the 2009 total — 
to reach a total capacity of 
40.2 gigawatts. Data released 
on 12 January by the Chinese 
Renewable Energy Industries 
Association show that China 
installed 16 gigawatts in 2010 
to reach 41.8 gigawatts. 


Pe RESEARCH 
Vostok drilling 


Russian researchers drilling 
down to the sub-glacial Lake 
Vostok, 3,750 metres under 
Antarctica's ice sheet, told 
Nature that they hope to reach 
within 20 metres of the lake's 
surface by 6 February — the 
last day of Antarctic summer 
operations. But they do not 
expect to penetrate the pristine 
lake this season. Operations 
will resume in December. 


AMR NABIL/AP 
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THE RACE TO LONGER LIVING 


Improvements in life expectancy in the United States are lagging 
behind those in other wealthy countries. 
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Smoking gun 

Life expectancy in the United 
States is lower than in many 
other wealthy countries (see 
chart), despite the country’s 
huge health-care spending. 
The popularity of smoking 

in past decades and today’s 
rising obesity rates are partly 
to blame, a report from the US 
National Research Council 
concluded last week. 


LHC extension 

The Large Hadron Collider 
(LHC) will run until the end of 
2012. The particle accelerator 
at CERN, Europe's high- 
energy physics laboratory 
near Geneva, Switzerland, 
had been due to stop in 2011 
for a year-long shutdown to 
upgrade its collision energy 
to 14 teraelectronvolts 

(TeV). But after a meeting in 


TREND WATCH 


1994 1999 2004 


Chamonix, France, last week, 
scientists decided to give 
themselves an extra year to 
collide particles — currently at 
7 TeV but potentially reaching 
8 TeV — with the hope of 
collecting enough data to spot 
the Higgs boson. See go.nature. 
com/wd9fug for more. 


Dengue control 


Six thousand mosquitoes 
genetically engineered to 

be sterile were released in 
Pahang state, Malaysia, on 

21 December. But the field trial, 
which aims to control dengue 
fever by suppressing mosquito 
populations, was announced 
only on 26 January, by 
Malaysia's Institute for Medical 
Research in Kuala Lumpur. 
Scientists and advocacy groups 
were surprised, as they believed 
the trial had been postponed. 


It follows larger trials on the 
Caribbean island of Grand 
Cayman in 2009 and 2010, 
all run by Oxitec, a company 
based in Oxford, UK. 


Intel research 

Intel is investing 

US$100 million in US 
universities over five years, by 
opening six to eight science 
and technology centres 

on university campuses 
throughout 2011. Announcing 
the investment on 26 January, 
the company said that its 

first centre will be at Stanford 
University in California, 
researching visual computing. 


Solar sailing 

Two trials of solar sails, which 
use the pressure of photons 
from the Sun to propel 
spacecraft, are going well. On 
26 January, Japan’s Aerospace 
Exploration Agency said that 
its Ikaros space capsule, which 
has a 200-square-metre solar 
sail, had completed six months 
of space flight and would have 
its mission extended to March 
2012. Six days earlier, NASA 
deployed a 10-square-metre 
solar sail, NanoSail-D, in 
low-Earth orbit. 


Pharma pruning 

In the latest round of drug- 
industry cutbacks, Abbott 
Laboratories, headquartered 
in Abbott Park, Illinois, will 


CITIES’ CARBON EMISSIONS 


Average per-capita greenhouse-gas emissions vary widely 


SEVEN DAYS | THIS WEEK | 


4 FEBRUARY 

The European Council 
discusses energy 

policy and European 
innovation, at a summit 
meeting in Brussels. 
go.nature.com/olubai 


10-11 FEBRUARY 
The future of the delayed 
LISA Pathfinder mission 
(see Nature 469, 280; 
2011) will be assessed 

by the European Space 
Agency's Science 
Programme Committee. 


cut 1,900 jobs, of which the 
firm said “a small number” 
were in research and 
development. Elan, based in 
Dublin, will cut 130 jobs — 
about 10% of its workforce. 
Around half of these are 
scientists, and most are based 
at the biotech firm’s research 
and development facility in 
South San Francisco. 


Biotech buy 
Biotechnology giant Amgen 
will pay US$425 million to 
acquire a cancer-vaccine 
company. Amgen, based in 
Thousand Oaks, California, 
could also spend up to 

$575 million in milestone 
payments for BioVex Group. 
The biotech, based in Woburn, 
Massachusetts, develops 
tumour-killing viruses 

that also provoke immune 
responses against the cancer. 


Although cities generate most 
of the world’s greenhouse-gas 
emissions, their per-capita 
emissions vary widely (see 
chart). New York, for example, 
has half the per-capita emissions 
of Denver, thanks to its denser 
population and lower reliance 
on cars, notes a study published 
last month (D. Hoornweg 

et al. Environ. Urban. 
doi:10.1177/0956247810392270; 
2011). Rotterdam’s per-capita 
emissions are particularly high 
because its port attracts industry 
and fuelling of ships. 


between cities even within a country. 


Drug deal near 


Canada Drug-maker Sanofi-aventis 
Toronto* ; 
Calgary of Paris was reported to be 
China nearing a deal to buy Genzyme 
eee of Cambridge, Massachusetts, 
anghai is 
Germany as Nature went to Pree: Sanofi 
Hamburg is said to have boosted its 
cal rejected US$18.5-billion bid. 
7 Last month, Sanofi suffered a 
Netherlands setback when late-stage clinical 
Rotterdam trials showed that its anticancer 
United States drug iniparib failed to slow 
Denver 
New York E : advanced breast cancer. 
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Induced pluripotent stem cells retain ‘memories’ of the adult cells from which they are derived. 


Flaw in induced- 


stem-cell model 


Adult cells do not fully convert to embryonic -like state. 


BY ELIE DOLGIN 


edical researchers’ hopes of replacing 
Meets fraught embryonic stem 

(ES) cells with stem cells derived 
from adult tissues have suffered a setback. 
Induced pluripotent stem (iPS) cells, created 
by turning back the developmental clock on 
adult tissues, and ES cells display similar gene- 
expression patterns, and both can produce any 
of the various tissues in the human body. But 
patterns of epigenetic changes — alterations 
that affect gene expression without changing 
the DNA sequence — tell a different story about 
iPS cells, a team led by Joseph Ecker, a molecu- 
lar geneticist at the Salk Institute in La Jolla, 
California, reports online in Nature this week!, 


“They are slightly different creatures,’ says 
Chad Cowan, a stem-cell biologist at Massa- 
chusetts General Hospital in Boston who was 
not involved in the work. The finding suggests 
that iPS cells may not be suitable substitutes for 
ES cells in modelling or treating disease. 

Ecker and his colleagues analysed patterns of 
DNA methylation, a type of epigenetic change, 
across the genomes of 15 cell lines. These 
included four human ES cell lines, five iPS cell 
lines and the tissues from which they came, 
as well as differentiated cells made from both 
kinds of stem cells. “If you look with blinders 
on, they look fairly similar,” says Ecker. “But if 
you zoom in you find different signatures of 
what an iPS cell is.” 

The researchers found that rather than being 
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reset to an embryo-like state, methylation 
patterns near the tips and centres of chro- 
mosomes in the iPS cells resembled those in 
the adult tissues from which the iPS cells had 
been derived. This could constrain the types 
of tissues that the cells are capable of form- 
ing. “The reprogramming process, although 
fascinating, is a fundamentally different 
way of getting to pluripotency than deriving 
cells from [embryos],” says George Daley, 
a stem-cell expert at Children’s Hospital 
Boston in Massachusetts. “We're still looking 
for reprogramming methods that return cells 
to the ES-cell-like state,” he adds. 

The finding that reprogrammed stem cells 
carry an epigenetic ‘memory’ dovetails with 
work published last year by Daley and others 
comparing mouse iPS and ES cells**. In mice, 
however, the methylation differences could 
be reset, either by continuing to culture the 
iPS cells or by differentiating the cells again to 
more specialized cell types. In the human cells, 
the epigenetic marks lingered even after the iPS 
cells had been coaxed to form new tissues. 

Regardless of their epigenetic differences, 
neither iPS cells nor ES cells may turn out to be 
perfect models of tissues in the body. Both cell 
types seem to harbour genomic abnormalities. 
In separate work published last month’, a team 
led by Jeanne Loring, a stem-cell researcher at 
the Scripps Research Institute in La Jolla, found 
that ES cells tended to contain duplicated 
chunks of DNA linked to genes associated with 
self-renewal, whereas iPS cells incorporated 
extra cancer-causing genes and fewer tumour- 
suppressor genes. These genomic differences 
between the two types of stem cells probably 
result from the culturing techniques used to 
derive and maintain them. 

“When we culture cells outside a normal 
organism they can acquire features that may 
not be compatible with life once they go back 
into an organism,’ says Richard Young, a stem- 
cell biologist at the Whitehead Institute in 
Cambridge, Massachusetts. 

The impact of such discrepancies remain 
unclear, says William Lowry, a stem-cell biologist 
at the University of California, Los Angeles. “The 
problem is that we don't know if any of these 
differences are going to be consequential.” m 


1. Lister, R. et al. Nature doi:10.1038/nature09798 
(2011). 

2. Kim, K. etal. Nature 467, 285-290 (2010). 

3. Polo, J. M. et al. Nature Biotechnol. 28, 848-855 (2010). 

4. Laurent, LC. et al. Cell Stem Cell 8, 106-118 (2011). 
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The US National Science Foundation has decided to give the ALMA Vertex Prototype Antenna, which can 
probe the Universe using sub-millimetre-wavelength radio waves, to an institute in Taiwan. 


Antenna decision 
makes waves 


Procedural transparency is at issue as a US agency transfers 
a high-precision radio dish to an international partner. 


BY EUGENIE SAMUEL REICH 


antenna, perfect for high-resolution sub- 

millimetre-wavelength astronomy. Pick 
it up yourself; no guarantees. Estimated value: 
US$10 million to $15 million. 

It was nearly that straightforward. Last year, 
the US National Science Foundation (NSF) put 
out a call for expressions of interest in a proto- 
type antenna that it had funded to test speci- 
fications for the Atacama Large Millimeter 
Array (ALMA), a 66-dish radio observatory 
now nearing completion in Chile. But what 
began as an opportunity for some cutting-edge 
science now has some US bidders crying foul 
after the NSF told them in early January that it 
is giving the Alma Vertex Prototype Antenna 
to an institute in Taiwan. 

“We've tried to find out why the NSF made 
the decision and we've been given only gener- 
alities,” says Lucy Ziurys, an astrochemist at 
the University of Arizona in Tucson and the 
principal investigator for one of the US bids. 
Any suggestion of improper decision-making 


haa to a good home: one 12-metre radio 


would be sensitive for the NSF at a time when 
government agencies are bracing for scrutiny 
from a budget-conscious Congress — and the 
donation ofa major piece of research hardware 
outside the United States could raise uncom- 
fortable questions. 

The antenna is valuable to astronomers 
because it is designed to probe the Universe 
using radio waves with wavelengths shorter 
than 1 millimetre — an under-explored region 
of the electromagnetic spectrum. Ziurys and 
her colleagues had proposed to put it on Kitt 
Peak, 60 kilometres southwest of Tucson, where 
it would be used to study the composition and 
dynamics of interstellar clouds, including star 
and planet formation. Instead, the dish will go 
to the Academia Sinica Institute of Astronomy 
and Astrophysics (ASIAA) in Taipei. 

Ziurys says that a group convened by the 
US National Radio Astronomy Observatory 
(NRAO) — which runs the facility near Socorro, 
New Mexico, where the antenna is currently 
located — ranked the University of Arizona’s bid 
above ASIAA’ for technical merit. But that was 
not enough to sway Vernon Pankonin, deputy 
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division director for astronomical sciences at 
the NSE, who says that he chose ASIAA as the 
winning bidder after consulting with an anony- 
mous assessment group internal to the NSE The 
group considered the NRAO recommendation, 
but concluded that ASIAAs bid was superior 
when several other factors, including scientific 
merit, were taken into account. The decision 
was not subject to an external peer review as an 
NSF grant would be. “It is a transfer of property 
completely independent of the NSF's grant and 
award process,’ says Pankonin. 

At the NSF’s direction, the original call for 
bids was issued by the NRAO, whose direc- 
tor, Fred Lo, is chairman of ASIAAs advisory 
panel and is friends with Paul Ho, director 
of the institute. Lo acknowledges the friend- 
ship and says that he may have discussed the 
antenna with Ho. He also says that he drafted 
the call for expressions of interest, which noted 
that proposals for the dish would be considered 
not only from the United States, but also from 
“the communities that form the North Ameri- 
can ALMA region (i.e. Canada and Taiwan)”. 
Lo says that the sentence was included at the 
request of the NSF; Pankonin says the proc- 
ess of deciding to include it was “interactive” 
between the NSF and NRAO. Lo says that after 
the call, he stayed out of the decision-making. 
“Precisely because of the potential charge of 
conflict of interest, the NRAO was quite care- 
ful. It took an objective process,” he says. 

Christine Boesz, a former inspector-general 
of the NSE, says that no matter what the final 
decision was, it could be seen as problematic 
for a person who could be partial to one bidder 
to write a call for expressions of interest. “You 
could raise the question of is it really arms- 
length decision-making,’ she says. Zachary 
Kurz, spokesman for the Republican majority 
of the House Committee on Science, Space and 
Technology in the US Congress, says that com- 
mittee staff are “starting a dialogue with the 
NSF” about a possible conflict of interest. 

Officially, the antenna is being given to the 
Harvard-Smithsonian Center for Astrophysics 
in Cambridge, Massachusetts, which collabo- 
rated with ASIAA on the proposal, so on paper 
it will remain a US asset, says Pankonin. Never- 
theless, the proposal makes it clear that ASIAA 
will take responsibility for the antenna, which 
it plans to use for very-long-baseline interfer- 
ometry (VLBI) — a technique in which data 
from radio telescopes continents apart can be 
combined to produce high-resolution images. 
The group’s primary targets include the centre 
of the galaxy M87, which contains the only 
supermassive black hole beyond the Milky Way 
whose perimeter could potentially be imaged 
using VLBI. A location for the antenna has not 
yet been confirmed, but ASIAA is interested 
in an NSF site known as Summit Station, at 
the peak of the Greenland ice sheet, where the 
cold, dry air would allow the telescope to see 
even shorter wavelengths than are detectable 
at Kitt Peak. m 
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China sets 2020 vision for science 


Goals include commercialization of research and emphasis on energy, biomedicine 


and information technology. 


BY JANE QIU IN BEIJING 


( Year is betting that an ambitious pro- 
gramme of applied research will help to 
secure its future as an economic super- 

power. Innovation 2020, unveiled last week 

by the Chinese Academy of Sciences (CAS), 
maintains support for basic research. But the 
plan will place a new emphasis on translat- 
ing the research into technologies that can 
power economic growth and address pressing 
national needs such as clean energy, said Bai 

Chunli, vice-president of the CAS, at the acad- 

emy’s annual conference in Beijing, where the 

plan was announced. 

Innovation 2020 is an extension of the 
Knowledge Innovation Programme (KIP) 
launched by the CAS in 1998. Under the KIP, 
the academy streamlined its often overstaffed 
and outdated institutes, attracted outstanding 
Chinese researchers who had trained abroad, 
and tightened up the way it evaluated project 
proposals and performance. But the CAS now 
needs to support new priorities, says Duan 
Yibing, a policy researcher at the CAS Institute 
of Policy and Management in Beijing. China 
has become a global economic power, and 
the world’s financial crisis has made scientific 
innovation more important to economic suc- 
cess than ever before, he says. “Things are a lot 
different now compared to 13 years ago.” 

Although the budget of Innovation 2020 is 
yet to be announced, insiders say it will be part 
of a continuing surge in the nation’s science 
spending (see ‘Spend, spend, spend’). Indeed, 


SPEND, SPEND, SPEND 


China's investment in science has risen rapidly 
over the past decade. 
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China is investing heavily in renewable-energy research as it builds its capacity in, for example, solar power. 


the CAS’s expenditure on research and devel- 
opment (R&D) in 2009 was about 20 billion 
renminbi (US$3 billion), seven times the level 
in 1998, according to a KIP assessment report 
also released last week. This year’s budget for 
the National Natural Science Foundation of 
China will increase by 70%, from 10 billion 
renminbi last year. 

Innovation 2020 will kick off with new 
projects this year in seven key areas, including 
nuclear fusion and nuclear-waste management; 
stem cells and regenerative medicine; and 
calculating the flux of carbon between land, 
oceans and atmosphere. Other priority areas 
include materials science, information tech- 
nology, public health and the environment. 

To coordinate resources better and to foster 
multidisciplinary research, the academy will set 
up three research centres for space science, clean 
coal technologies and geoscience monitoring 
devices. It also plans to build three science parks 
— in Beijing, Shanghai and Guangdong prov- 
ince, respectively — to accelerate the conver- 
sion of basic research into marketable products, 
especially in renewable energy, information 
technology and biomedicine. 

Pan Jiaofeng, deputy general secretary of 
the CAS, says the KIP’s 
track record bodes well 
for the success of the 
new programme. By the 
CAS's reckoning, in 2009, 
researchers that it funded 
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published 3.5 times as many papers in journals 
listed by the Science Citation Index (SCI) as 
in 1998. Crucially, the number of papers pub- 
lished in the top 1% of SCI journals, as judged 
by their impact factor, was 12 times that in 
1998. The CAS also calculates that research and 
development by the KIP generated an income 
of 140 billion renminbi and tax revenue of 
22 billion renminbi in 2009 — respectively 19.5 
and 14.5 times the levels in 2000. 

But the report acknowledges that there is 
substantial room for improvement. For exam- 
ple, CAS researchers should aim to become 
leaders of the international scientific commu- 
nity, and shift their focus away from generating 
as many papers as possible and towards genu- 
ine originality and innovation. 

With its emphasis on applied research, the 
new initiative also “presents a major challenge 
to the management and organizational capa- 
bilities of the academy’, says Richard Suttmeier, 
ascience-policy researcher at the University of 
Oregon in Eugene. He notes that most CAS 
institutes are focused on academic disciplines 
and lack the infrastructure needed for com- 
mercializing research or directing it towards 
national needs. 

Others think that the emphasis on applied 
research, national needs and revenue could 
stifle curiosity-driven research. Without that, 
says a Shanghai-based researcher who declines 
to reveal his identity, “it would be very difficult 
to have genuine innovation” = 
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Research commissioner Maire Geoghegan-Quinn faces calls for drastic changes to the EU funding system. 


EU advisers urge 
funding reform 


The European Commission should free its Framework 
programme from political interference and red tape. 


BY NATASHA GILBERT 


urope’s multi-billion-euro research 
He gramme needs significant reform to 

slash bureaucracy and ensure continued 
support for cutting-edge science. That’s the 
verdict of the top science advisory group to 
the European Commission (EC). As the execu- 
tive body of the European Union (EU), the EC 
oversees the €50.5-billion (US$69.3-billion) 
Framework funding programme. 

In a set of unpublished recommendations 
made to the EC in December, and now seen 
by Nature, the European Research Area Board 
(ERAB) says that the management of Frame- 
work funds should be devolved to “independ- 
ent institutions at arm’s length of Commission 
and Member States influence”. Unless there 
is a “drastic” change in how the programme 
operates, it adds, “Europe's ability to compete 
or cooperate in the global environment will 
significantly diminish”. 

The warning comes at a crucial time for the 
EC, as it prepares to launch a public consul- 
tation of stakeholders on the successor to the 

current Seventh Frame- 


© NATURE.COM work Programme (FP7), 
For more European Europe’ chief research- 
science news, see: funding mechanism, 
www.nature.com/ which ends in 2013 (see 
regions/europe ‘Planning a Framework’). 


Under FP7, the EC organizes research agendas 
through ten priority themes, such as energy 
and transport. ERAB suggests that agencies 
modelled on the European Research Coun- 
cil (ERC) — an EU initiative set up in 2007 
to award research grants solely on the basis of 
excellence — should instead be set up to sup- 
port these priority research areas. 

ERAB says that the EC, as well as the member 
states of the EU, would continue to havea role in 
defining the proposed agencies’ overall strategy, 
including research priority areas and their budg- 
ets. But the agencies would execute the strategy 
and determine which proposals would receive 


PLANNING A FRAMEWORK 


Consultation launches on the 

future of EU research funding > 9 FEB 2011 
Consultation closes > MAY 2011 
Results announced > JUN 2011 
Proposals presented 

by EC for next Framework > END 2011 
programme (FP8) 

Proposals negotiated 

by EC, EU member states > 2012 

and European Parliament 
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funding, with success judged on the delivery of 
new discoveries, insights or technologies. 

The board acknowledges that some of these 
research programmes would probably have a 
higher risk of failure than many FP7 ventures, 
and says that the agencies would therefore need 
managers with “considerable responsibilities 
and powers’ who are not restricted by “unnec- 
essary bureaucratic constraints”. John Wood, 
ERAB’s chair, declined to comment on the 
recommendations ahead of their publication. 


HUGE HASSLE 

ERAB’s recommendations are likely to be wel- 
comed by many of Europe’s researchers, who 
have long deplored the EC’s excessive bureauc- 
racy and risk-averse approach to research 
funding (see Nature 463, 999; 2010). “There is 
alot of administrative work. Proposals have to 
be very detailed and precise. This is not always 
how science works,” says Antoine Peters, a 
molecular biologist at the Friedrich Miescher 
Institute for Biomedical Research in Basel, 
Switzerland. Peters adds that he has been put 
off applying for funding from the programme 
because “it is such as hassle. I avoid it if I can. 
Id rather go for national or local funding” 

The European Association of Research and 
Technology Organisations, a Brussels-based 
trade group, supports the idea of independent 
agencies managing research programmes, says 
Pauline Bastidon, the group's policy officer. In 
particular, it would like to see an agency, simi- 
lar to the ERC, in charge of funding for applied 
research and innovation, she adds. 

But Luke Georghiou, vice-president of 
research and innovation at the University of 
Manchester, UK, does not think that shifting 
responsibility to independent agencies is a 
panacea, and points out that the ERC has still 
had to battle EC bureaucracy. He proposes 
retaining the overall shape of the programme 
but with major simplifications, including more 
flexibility in calls for proposals. 

An EC spokesman declined to comment on 
ERAB’s report, adding that it would be taken 
into account during the consultation. That 
process will kick off when the EC releases a 
green paper outlining its proposals for the 
next Framework programme on 9 February. 
But a draft of the green paper, seen by Nature, 
may disappoint research leaders who were 
expecting to see a set of defined ideas. Instead, 
it lists 24 broad questions to be addressed in 
constructing the programme, but offers no 
firm options. For example, the document asks 
whether new rules could help to simplify the 
programme while giving it flexibility, but fails 
to suggest what these rules would look like. 

“The green paper doesn't say anything,” 
says a senior EU science official involved in 
Framework discussions, who asked not to be 
named, as commission rules forbid them from 
commenting on unpublished documents. “It 
makes me think the commission is not inter- 
ested in having a proper debate.” = 
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Swindon, we have 


a problem 


Britain’s space ambitions lack financial fuel. 


BY GEOFF BRUMFIEL 


space agency, David Williams has had 

a prosaic year. Rather than standing in 
front of banks of computers, watching the test 
launch of a bold new rocket design, the interim 
director of the UK Space Agency (UKSA) has 
been shuffling between almost a dozen govern- 
ment entities, arranging the transfer of budget 
lines and staff members. “Most of what we've 
done this year has been internal,” he says. 

The UKSA was announced with much fanfare 
in March 2010, but its low-key start and lack- 
lustre annual budget of £206 million (US$332 
million; €241 million, see chart) has left some 
disappointed. “At the moment, 
were not happy,’ says Richard 
Peckham, the chair of UKspace, 
the industry's trade association. 
“But it’s still early days.” 

Britain has a difficult history 
in space. The country launched 
its first — and only — entirely 
home-grown rocket into orbit 
in 1971, but even before it flew 
the government had decided to 
kill the programme to cut costs. 
More recently, the 2003 failure of the Beagle 2 
Mars probe left the nation’s pride smarting. 
Despite these setbacks, the country’s private 
sector has made steady progress. Between 1999 
and 2007, the space industry grew by a steady 
9% per year and today boasts about £6 billion in 
annual revenues. Much of that money is made 
by small satellite manufacturers and telecom 
companies offering satellite-phone and Earth- 
observing capabilities. 

Until last year, the nation’s space agenda — 
which included environmental satellites anda 
few collaborative missions with the European 
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Space Agency (ESA) — was overseen by the 
British National Space Centre (BNSC), a 
somewhat toothless organization, also run 
by Williams. “Different government depart- 
ments held the budget for the bits of space they 
were interested in,’ says Peckham. “The BNSC 
partnership tried to glue that together” 

Industry wanted a more powerful agency, 
along with an increased focus on space as an 
area of economic growth and innovation. The 
then Labour government agreed and created 
the UKSA, along with a space innovation cen- 
tre at Harwell in Oxfordshire — where a new 
ESA centre had just been sited — that included 
a £40-million national space facility. 

But as the Swindon-based UKSA prepares 
to officially open for business 
on 1 April, it has yet to put for- 
ward a clear plan. “We are still 
waiting for the emergence ofa 
confirmed UK space strategy,” 
says Martin Ditter, head of the 
ESA centre at Harwell. 

The slow start is largely due 
to the ousting of the Labour 
government in May 2010. In 
December, the new coalition 
government announced that, 
as part of its austerity programme, the overall 
UKSA budget for the next four years would 
be almost flat. That makes it unlikely that the 
agency will launch its own spacecraft any time 
soon, although researchers hope that its uni- 
fying voice will help Britain to negotiate more 
prominent roles in ESA science missions. 

“Against the backdrop of 25% cuts in gov- 
ernment spending, they've probably done a 
pretty good job,’ says Peckham. But there is a 
tinge of disappointment in his voice as he adds, 
“Tt doesnt yet look like the empowered agency 
that we want to see.” = 
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Understanding why loneliness can spread through society like a disease is a key question for social scientists. 


RESEARCH POLICY 


Social science lines up its 
biggest challenges 


‘Top ten’ crucial questions set research priorities for the field. 


BY JIM GILES 


ow can we persuade people to look 
Hie their health? Why do moods 

spread like a contagion? How can 
humanity increase its collective wisdom? 

These are some of the most pressing ques- 
tions that social scientists should tackle, 
according to a group of leading scholars in the 
field who hope that their ‘top ter’ list will help 
shape the thinking of researchers and funding 
bodies for decades to come. 

In a parallel effort, the US National Sci- 
ence Foundation (NSF) last week unveiled 
the results of its own agenda-setting exercise, 
which asked social scientists to identify “grand 
challenge questions that are both foundational 
and transformative”. 

Both groups say that they ran the exercises 
because they wanted researchers to step back 
from immediate research priorities and iden- 
tify the most significant problems in their field. 
The results demonstrate the growing ambition 
of the social sciences to tackle difficult issues in 


A; 
Bi 


y | 


a quantitative way, addressing problems from 
equality and wages to wars and health. 

The ‘top ter’ approach was inspired by a list 
of 23 major unsolved questions compiled by 
the mathematician David Hilbert in 1900. The 
Hilbert problems helped to focus the attention 
of mathematicians throughout the following 
century. “He laid out the road map for twentieth- 
century math,” says Nick Nash, a vice-president 
at General Atlantic, an investment firm based 
in Greenwich, Connecticut. “What if we had a 
road map for other disciplines?” 

In 2008, Nash was studying for an MBA at 
Harvard University in Cambridge, Massachu- 
setts, when he proposed the road map to Stephen 
Kosslyn, then the university's dean of social sci- 
ence. Together, they organized a symposium 
at Harvard last April that gathered together 
‘big thinkers’ to present unsolved questions 
and to vote on which were the most important. 
The results are due to be released this week on 
Harvard's website (see go.nature.com/walhwf; 
see also “Top ten social-science questions’). 
The site will also include a range of questions 
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submitted by members of the public. 

Atthe symposium, Emily Oster, an economist 
at the University of Chicago, Illinois, focused on 
a perennial challenge for public-health experts: 
how to get people to adopt healthier behaviours. 
For instance, persuading people to eat less and. 
exercise more — to control ballooning obesity 
rates — might be simple in theory; in practice 
it is extremely difficult. 

Because the rewards of behavioural change 
are often not apparent for years, Oster thinks 
the answer lies in programmes that offer an 
immediate pay-off. Preliminary studies have 
shown, for example, that cash rewards contin- 
gent on hitting weight-loss targets can help’. 
Even if payments amount to hundreds of 
dollars a month and, to prevent a relapse, are 

continued after dieters 
have shed their excess 


To suggest more pounds, the strategy 
questions for social- + might save society money 
science research, by reducing future medi- 
comment at: cal expenses. 


The approach is not 


E. MARCH/CORBIS 


1. How can we induce people to look after 
their health? 


2. How do societies create effective and resilient 
institutions, such as governments? 


3. How can humanity increase its collective wisdom? 


4. How do we reduce the ‘skill gap’ between black and 
white people in America? 


5. How can we aggregate information possessed by 
individuals to make the best decisions? 


6. How can we understand the human capacity to 
create and articulate knowledge? 


7. Why do so many female workers still earn less 
than male workers? 


8. How and why does the Social’ become biological’? 


9. How can we be robust against black swans’ — 
rare events that have extreme consequences 2 


1 O. Why do social processes, in particular civil 
violence, either persist over time or suddenly 
change? 


foolproof, though. One recent large-scale 
experiment, aimed at financially reward- 
ing low-income New York City families for 
keeping children in school and taking regu- 
lar medical check-ups, was halted last year 
after it produced only limited improvements 
(see go.nature.com/eunolm). Oster says that 
researchers need to experiment with differ- 
ent reward systems, as it is not clear how the 
best system for a particular problem should 
be chosen. 

Nick Bostrom, a philosopher at the Univer- 
sity of Oxford, UK, who was also involved in 
compiling the Harvard list, wants social science 
to improve society’s “ability to get the impor- 
tant things approximately right”. He notes that 
judgements by specialists are often no better 
than those made by laypeople’, and suggests 
that, rather than relying on individuals, society 
should develop and exploit new methods for 
aggregating knowledge. 

In financial markets, for example, par- 
ticipants buy and sell shares on the basis of 
expectations about how the market will move. 
If enough people play the market, the price of 
the shares reflects traders’ collective beliefs 
about future events. It is also possible to cre- 
ate artificial markets in which traders buy and 
sell shares related to specific events, such as a 
politician being elected. These markets also 
reflect traders’ beliefs about outcomes, and can 
be good forecasting tools’. Bostrom would like 
to see such ‘prediction markets trialled more 
widely; they could assist with corporate deci- 
sions, such as whether to replace a company 
chief executive, he suggests. 

Harvard social scientist Nicholas Christakis 


IN FOCUS 


hopes to understand how physiolo- 
gical and psychological attributes, 
such as obesity and loneliness, can 
spread through a social network like a 
contagious disease, a phenomenon he 
has studied with James Fowler at the 
University of California, San Diego’. 
Ifone of your friends becomes obese, 
for example, your own chances of 
putting on weight increase. Christakis 
notes that there is unlikely to be a 
single theory that links social and 
biological factors. In the case of 
obesity, it may be that having an 
overweight friend somehow nor- 
malizes the idea of gaining weight. 
And preliminary work in poor 
neighbourhoods in Chicago sug- 
gests that loneliness and fear of crime 
can alter levels of stress hormones, 
which in turn can affect people's risk 
of cancer. 

Nash and Kosslyn hope that draw- 
ing attention to difficult and impor- 
tant problems will motivate young 
researchers to work on them, just as 
young mathematicians were attracted 
to the Hilbert problems. “Nothing 
would make us happier than to see 
future grant applications that mention the 
‘Harvard problems,” says Nash. The similar- 
ity with the NSF’s own exercise is not a bad 
thing, adds Myron Gutmann of the founda- 
tion’s directorate for social, behavioural and 
economic sciences in Arlington, Virginia. “Pm 
delighted by [the Harvard exercise],” he says. 
“Tt allows us to look for repeated themes.” 

The NSF received more than 240 responses 
to its request for forward-looking ideas, which 
Gutmann plans to discuss with his advisory 
committee. “I can imagine that in the next two 
years we will identify a few ideas that seem 
especially important and invest in them, in 
the form of pilot projects and planning grants, 
with an idea that these investments will posi- 
tion us to make more significant investments 
5-10 years from now, he says. 

Cary Cooper, chair of the Academy of Social 
Sciences in London and a psychologist at Lan- 
caster University, UK, is enthusiastic about 
the Harvard list. If funding were available to 
support work on the problems, he says, young 
researchers might feel confident enough to 
eschew simpler questions. Cooper adds that 
he will consider asking Britain’s Economic 
and Social Research Council to run a similar 
exercise. ml 
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Kepler search space 
900 parsecs 
ae - 


THE STARS 


Launched in 2009 to seek out worlds beyond the Solar System, the Kepler 
mission is exceeding expectations. Is it closing in on another Earth? 


BY EUGENIE SAMUEL REICH 


itting for an interview in his office at the Harvard-Smithsonian Center _ The Kepler 
for Astrophysics (CFA) in Cambridge, Massachusetts, the normally volu- _ space telescope 
ble astronomer Dimitar Sasselov looks nervous. Asked for his favourite __ is exploringa 
among the many potential planets discovered by NASAs Kepler planet- sliver of the 
finding mission, for which he is a co-investigator, he hesitates, then Milky Way some 
sidesteps the question entirely. “Personally, I'm already beyond that point. 900 parsecs 
It’s not one. It’s not a single planet. It’s a whole family” (about 3,000 light 
Sasselov has good reason to be wary: his public lecture last July at years) deep. 
the Technology, Entertainment and Design 2010 conference in Oxford, 
UK, earned him a stern rebuke from his colleagues on the mission. Not 
only had he presented numbers for possible planets greater than those 
released officially by the team, they said, but he had also used a careless 
phrasing that resulted in a raft of headlines proclaiming — incorrectly 
— the discovery of hundreds of other Earths. 
That hullabaloo became a distant memory this week, as the 
mission released 400 candidate systems, adding to the 306 released 
last June. Along with the candidates came a bunch of confirmed 
planets. The latest finds, posted on NASA’s website last month 
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(see go.nature.com/aejd15) and published in Nature this week’, include 
a rocky planet orbiting so closely to its star that its starlit side must be 
a seething sea of lava; and a planetary system containing several large, 
rocky or icy planets in orbits of tens of days, just one order of magnitude 
faster than Earth’s 365-day cycle. “It’s very exciting. It’s a type of system 
we havent seen before,” says Jack Lissauer, a space scientist and Kepler 
co-investigator based at the NASA Ames Research Center in Moffett 
Field, California, and a lead author on the paper in Nature. 

Nonetheless, most of the Kepler scientists continue to be cautious. 
By watching the light from some 150,000 stars for the dimming that 
could signal a planet crossing in front of them, Kepler is extraordinarily 
efficient at finding possible planets. But Kepler has yet to find another 
Earth — a small, rocky planet with an orbit of a few hundred days and 
well inside the habitable zone in which water can exist and life can arise. 
That is for a fundamental reason; the blips that Kepler detects show only 
the radius, and not the mass, of an observed planet, which means that 
the density and composition generally remain unknown. 

Moreover, the scientific objective of the Kepler mission is not to 
discover Earth-like planets. Instead, it is to estimate the fraction of Sun- 
like stars that have Earth-like planets — statistics that could greatly 
enhance astronomers’ understanding of how planetary systems form. 
Determining which of the blips correspond to planets — rather than 
systems of stars in which one is eclipsed, causing a similar dimming — is 
what the researchers spend most of their time on, says William Borucki, 
a space scientist and Kepler principal investigator at NASA Ames. The 
only way to do that, he says, is the hard way: painstakingly sorting the 
real signals from the false positives. 


OBSERVATIONAL BIAS 
Until Kepler, the leading detection method used to discover exoplanets 
— planets outside the Solar System — was much more likely to find 
giant planets, resulting in a sampling bias. Known as radial velocity 
or Doppler spectroscopy, the 
method depends on identifying 
the shift in a star’s spectral lines 
as it wobbles around a mutual 
centre of gravity with a planet. 
The larger the planet and the 
closer to the star it lies, the faster 
the star’s movement towards and 
away from Earth, and the easier it 


NO PLACE LIKE HOME 


10,000 
is to detect the shift in the spec- 
tral lines. Almost all of the planets 
found by this technique have been 1,000 


larger than Jupiter and very close 
to their stars, sometimes complet- 
ing an orbit in just a few days. 

An alternative method was 
presented in 2000, when CFA 
astronomer David Charbon- 
neau and his colleagues, work- 
ing from a shed in a car park 
outside the National Center for 
Atmospheric Research in Boul- 
der, Colorado, observed a planet 
passing across — or transiting — 
the face ofits parent star’. Within 
days, another group had made 
a similar observation’. In this 
case, the researchers were con- 
firming a transit predicted for a planet, HD 209458b, that had been 
spotted using the radial-velocity method. 

Before long, planets were being detected by their transits alone. Those 
early detections also yielded large, close-in planets, which were easier to 
see because they obscured larger portions of their host stars than their 
Earth-size counterparts would do. But researchers were thrilled to realize 
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Kepler looks for exoplanets with masses and orbits similar to Earth’s. Most 
of the discovered exoplanets are very different, but the Kepler-11 planets 
(estimates for only five are exact enough to plot) are among the closest yet. 
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that, in principle, a space telescope could be made sensitive enough to 
see the transits of Earth-sized planets in Earth-like orbits — and the idea 
of Kepler was born. 

Kepler was designed as a 0.95-metre-diameter space telescope 
that would detect exoplanets by monitoring variations in the light 
from stars. Unlike most ground-based telescopes and the French 
Space Agency’s COROT planet-finder mission, which monitor tar- 
gets for months at a time at most, Kepler was intended to stare at the 

same, fixed field of view for 


“THIS IS OUR FIRST 3-4 years. This field encom- 
DATA POINT DOWN passes 150,000 stars in the 

Cygnus and Lyra constella- 
sepals — tions, chosen because of a 
REGIME —IT’SA prevalence of Sun-like stars. 
HUGE MILESTONE.” = The commitment to the 


same star field made Kepler 
unique in being able to capture three or four repeat observations of 
transits by small planets in yearly, Earth-like orbits, even if it wouldn't 
be able to determine their mass and composition. 

The spacecraft was launched in March 2009, and the Kepler scientists 
announced their first few planets the following January. “We were just 
skimming the cream off the top,” says Natalie Batalha, an astronomer at 
San Jose State University in California and Kepler’s deputy science team 
leader. At that point, the short timescale over which the mission had 
been operating continued to favour the discovery of giant planets with 
fast orbits that were too close to their host stars to be habitable. These 
included five giant planets with orbits of between 3.2 days and 4.9 days’. 
But the 306 planetary candidates released a few months later told a 
different story. Most of these correspond to planets that are Neptune- 
sized or smaller, and nearly 40 are smaller than twice Earth’s size’. If 
confirmed, Batalha estimates, five would correspond to planets orbiting 
within the habitable zones of their stars. 

The process of turning a can- 
didate into a confirmed planet 
is tortuous. Every month, pixels 
representing a continuous flux 
of light captured from the tar- 
get stars are downloaded from 
the spacecraft to computers at 
NASA Ames, where they are 
converted into light curves — 
graphs showing the intensity of 
light from the star as it changes 
with time. The software flags up 
about 2,000-3,000 dips in light 
curves automatically, and these 
are then sent to a committee 
headed by Batalha. Those not 
rejected as obvious false positives 
— owing to instrument noise, for 
example — are assigned a Kepler 
object of interest (KOI) number. 
Mission scientists estimate that 
50% of the KOIs are real plan- 
ets, but they have been able to 
confirm only 15 of the 306 KOIs 
announced so far (see ‘No place 
like home’) — including those 
published in this issue. 

The most obvious way to rule 
out a false positive is to detect the planet using another method. Char- 
bonneau, now a participating scientist on Kepler, is working to follow 
up the KOIs with NASAs Spitzer Space Telescope, which is sensitive to 
infrared radiation. It is ideal for ruling out a type of false positive known 
as a blend, which consists of a much brighter star in the same line of 
sight as two dimmer, orbiting stars, so that one occasionally eclipses 
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the other. To Kepler, a blend can look like the transit of a Jupiter-sized 
planet, says Charbonneau — but not to Spitzer, because the three stars 
will have different proportions of their light in the infrared and visible 
wavelengths. 

Kepler data alone can confirm a candidate when it is part of a system 
of several planets. Last year, for example, researchers found a planetary 
system with a pair of planets transiting the same star, at almost regu- 
lar intervals. “We started to lavish more individual attention on the 
system once we saw the transits were varying,’ says Matthew Holman, 
an astrophysicist at the CFA and a member of the Kepler team. It was 
a sign that the planets were real: planets can vary by as much as sev- 
eral minutes per orbit if they are interacting gravitationally with one 
another. In a rapid orbit, this is the equivalent of Earth’s revolution 
around the Sun changing by a few hours each year. The new planets, 
dubbed Kepler-9b and -9c, had radii about 0.8 times the size of Jupiter’s, 
and orbits of 19 days and 39 days, respectively®. By modelling the 
gravitational interactions of the planets, the team calculated that their 
probable masses are similar to that of Saturn. 
Knowing the mass and radius of each planet 
helped the astronomers to estimate that the 
composition of the planets was hydrogen- 
and helium-rich, making them very much 
like the gas giants Saturn and Jupiter. 


Kepler-9 was the first 


system in which several 
planets were found to 
transit the same star. The 
paper on page 53 (ref. 1) 
announces a second, 
Kepler-11, in which as 
many as six planets transit, 
with orbital periods of 10, 
13, 22, 31, 46 and 118 days, 
and masses between 2.3 and 
more than 300 times that of 
Earth. Although the outer 
four are gas giants, the inner 
two could be ice giants like 
Neptune. Or they could be 
super-Earths — planets 
several times larger than 
Earth but consisting of a mixture of rock and 
gas. The team was surprised to see as many as 
six planets transiting in the same system, says 
Lissauer, and further astounded to find the 
inner planets so densely packed that, were 
they any more so, their orbits would not be 
stable. “It’s an amazing system,” he says. 


SUBTLE EFFECTS 

Kepler-11b-g come hot on the heels of the January announcement of 
Kepler-10b, a dense planet circling a Sun-like star in a 0.84-day orbit. In 
this case, there were no observable transit-timing variations, but because 
of the closeness to the star, the researchers were able to confirm the planet 
using ground-based radial-velocity observations that also showed the 
planet’s mass. At 4.6 times Earth’s mass and only 1.4 times its radius, the 
planet is dense enough to be unambiguously rocky — although its close- 
ness to the star means that one side will be constantly molten rock. After 
careful studies of the host star and analyses of a year’s worth of data, the 
Kepler team was able to pick out not only the dimming due to the transit, 
but also the cycle of brightening and dimming as the orbiting planet 
alternately showed its day and night sides towards 


Earth. “It’s phenomenal” that such a subtle effect NATURE.COM 
was detectable, says Batalha. It also shows what For more on 
unexpected insights can be gleaned from Kepler’s _exoplanets, visit:. 
observations. “This is our first data point downin __go.nature.com/ibz7ft 
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Kepler’s full field of view, which encompasses 
100 square degrees of sky and 150,000 stars. 


the rocky regime,” says Batalha, “it’s a huge milestone.” 

The Kepler mission has detected other planets for which the mass 
cannot be determined. One example is Kepler-9d, a super-Earth 
found in the same system as Kepler-9b and -9c. Computer simulations 
by Guillermo Torres, an astronomer at the CFA, and his colleagues 
showed that eclipsing stars could not produce as good a fit to the 
observed light curve as the match produced by a super-Earth planet 
passing in front of the Kepler-9 star. That, says the team in a paper 
in The Astrophysical Journal’, is the first validation of a planet using 
a general method that could be applied to any of the Kepler candi- 
dates, even those that don’t show transit-timing variations and are 
too far from the star to be studied using the radial-velocity method. 
“The probability of a planet is higher than the probability of a false 
positive,” says Torres, “there is a statistical argument.” The method 
was used again to validate the sixth of the Kepler-11 candidates, 
Kepler-11g, a gas giant with an orbit of 118 days, far enough from 
the rest of the cohort that any transit-timing variations are too subtle 

to have been observed. 


THE KEPLER LEGACY 

Because of the time needed for repeat obser- 
vations of planets in Earth-length orbits, it 
will be years before the Kepler researchers 


other scientists from mak- 
ing preliminary estimates. 

In 2010, for example, 

a group led by Andrew 

Howard, an astronomer 

at the University of Cali- 

fornia, Berkeley, took the 

size distribution of planets 

found by the radial-velocity 

method and, by extrapolat- 

ing to lower masses, pre- 

dicted that Kepler will find 

Borucki is sceptical. “They extrapolated, 
which is nota mortal sin but it’s close,’ he says. 
But Lissauer is more sanguine. Ultimately, he 

says, results from the radial-velocity method 
can be combined with results from the transit 
method to produce a measured frequency of 
planets at different sizes, masses and compo- 
sitions — from rocky Earths to gaseous Jupi- 
ters. And those data, in turn, will be invaluable for helping astronomers to 
understand the origin and evolution of planetary systems throughout our 
Galaxy. “There's a whole load of good science in there,’ says Lissauer. 
The promise, says Batalha, is that Kepler will deliver a massive 


can establish the frequency 
of planets in the cosmos. 
But that hasn't prevented 
that roughly 22% of stars 
are orbited by Earth-size 
planets’. 

roster of objects for future generations to follow up on. “Kepler will 

leave this legacy. People are going to use these data for decades,” he 

Says. m SEE EDITORIAL P.5 


Eugenie Samuel Reich is a reporter for Nature based in Boston. 
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EXOPLANETS 
ON THE CHEAP 


The search for planets outside our Solar System will always be pricey. But creative 
solutions are proving that it no longer has to break the bank. 


BY LEE BILLINGS 


stronomers searching for planets around stars other than the Sun have 
had much to celebrate over the past decade. The number of confirmed 
‘exoplanets’ has soared from about 50 to more than 500 in that time. 
And although none of these planets closely resembles Earth, NASA's 
Kepler space telescope, launched in 2009, is now delivering candidates 
from distant stars by the hundreds — some of which may prove to be 


very Earth-like indeed (see page 24). 

The exoplanet search itself has been wildly successful, but not so the searchers’ quest for Ns 
multibillion-dollar follow-up missions. Hopes for ambitious spacecraft such as a Space 
Interferometry Mission or Terrestrial Planet Finder have been dashed as missions have 
been cancelled or postponed owing to a combination of sluggish economic growth, deep 
cuts to space-science funding and programme difficulties with NASA’s James Webb Space 
Telescope (JWST). 

In response, the planet-hunting community has got creative, devising ways to maxi- 
mize the science and minimize the costs. An Exoplanet Task Force jointly commissioned 
by NASA and the US National Science Foundation accordingly issued a report’ in 2008, 
supporting a new strategy for exoplanet research. Rather than waiting for the launch of 
costly, dedicated planet-hunting spacecraft, it calls for astronomers to press ahead with 
cheaper, ground-based surveys to discover worlds orbiting nearby stars, which appear 
brighter to us than do those farther away, and so are easier to study. The hope is that such 
low-cost surveys will yield at least a few worlds that can be studied using space-based 
resources such as the JWST. Such facilities would allow astronomers to spectroscopi- 
cally search the exoplanets’ atmospheres for ingredients such as carbon dioxide, water 
vapour and perhaps methane, oxygen and other trace gases, which could indicate that 
life is present. A 2010 report from the European Space Agency reached nearly identical 
conclusions’. 

“The planets are out there, and it’s relatively inexpensive to go after them,’ says Greg 
Laughlin, an astrophysicist at the University of California, Santa Cruz, who served on 
the NASA/National Science Foundation task force. “There’s an economic inevitability 
to this.” 

Of the many ideas that astronomers have come up with for conducting exoplanet 
searches on the cheap, five stand out. 
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PLANET-HUNTING FOR BEGINNERS 


Of the handful of techniques for finding exoplanets — planets orbiting stars 
other than our Sun — two are by far the most productive. 


RADIAL VELOCITY 


Radial velocity is the motion of 
a star caused by the 
gravitational influence of its 
orbiting planets. It can be 
measured through increases 
(blueshifts) or decreases 
(redshifts) in the frequency of 
light the star emits. Radial- 
velocity measurements can 
detect only planets whose orbits 
tug the star towards and away 
from the observer. The exact 
orbit of an exoplanet is hard to 
determine, so radial-velocity 
measurements let researchers 
deduce only the time a planet 
takes to orbit the star (its orbital 
period), how its orbit deviates 
from circular and its minimum 
mass. Radial velocity is most 
sensitive to massive planets 
with short orbital periods. 


Absorption 
spectrum 
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M-DWARF TRANSIT SURVEYS (US$2 MILLION) 


Central to the strategy is a focus on cool, red ‘M-dwarf’ stars close to the 
Solar System. Not only are there lots of them — M-dwarfs are the most 
abundant kind of star in the Milky Way — but they are much smaller 
and dimmer than the Sun, having less than half its mass. So any M-dwarf 
planet passing in front of the star, or ‘transiting’ it, would block a larger 
fraction of the light than it would of a larger star and would be easier to 
detect (see ‘Planet-hunting for beginners’). The comparative size of the 
transiting planet's silhouette would also make it easier for telescopes to 
gather light filtering through its atmosphere for spectroscopic analysis. 

The first, and so far most successful, search for potentially habitable 
planets transiting M-dwarfs is the MEarth Project (pronounced ‘mirth): 
a cluster of eight 0.4-metre robotic telescopes at the Whipple Observatory 
on Mount Hopkins in Arizona. Unlike all previous transit surveys, which 
stare at a fixed patch of sky rich with stars, MEarth targets 2,000 nearby 
M-dwarfs; only if one of these displays a candidate transit will all tele 
scopes observe it at once. The project is headed by David Charbonneau, 
an astronomer at Harvard University in Cambridge, Massachusetts, and 
was designed mainly by Philip Nutzman, now a postdoctoral researcher in 
astronomy at the University of California, Santa Cruz. MEarth announced 
the discovery of its first transiting planet in 2009 — a world dubbed 
GJ 1214 b, after the M-dwarf star it orbits some 13 parsecs from Earth’. 
The planet is too large and hot to harbour life as we know it, but was found 
in only the first six months of MEarth’s proposed three-year running time, 
and so far remains the most easily studied Earth-like exoplanet known. 
A spectroscopic study of GJ 1214, undertaken last year at the European 
Southern Observatory (ESO) in La Silla, Chile, showed that the planet's 
upper atmosphere is either very hazy or is composed of water vapour’. 

“MEarth shows that for a relatively modest investment of US$1 million 
or $2 million, you can put together a ground-based survey capable of 
finding habitable-zone super-Earths,” says Charbonneau, referring to 
rocky planets that are larger than Earth and orbit their stars at a distance 
at which water can exist as a liquid. “The answer to the question being 
asked is certainly worth a lot more than that” 

By October 2011, Charbonneau and his colleagues hope to havea copy of 
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TRANSITS 


When a planet crosses, or 
‘transits’, the face of its star, it 
dims the star’s light by a small 
but detectable amount. 

The probability that any planet’s 
transit will be visible from Earth is 
low, and is dictated by the ratio 
of the diameter of the star to the 
diameter of the planet’s orbit. 
Large planets with short-period 
orbits of small stars are most 
likely to be seen transiting, and 
lots of stars must be surveyed for 
any transits to be found. Transits 
let researchers deduce the radius 
of a planet and its orbital period. 
Astronomers can sometimes 
study a planet’s atmosphere as 
starlight filters through or reflects 
off it. This gives information on 
atmospheric composition, 
temperature and cloud formation. 


Redshift 


Exoplanet 


MEarth operating in Chile, where it will see parts of the sky notvisible from 
Arizona. Several other M-dwarf transit searches are also under way, nota- 
bly at the 0.6-metre TRAPPIST telescope in La Silla, and at the 1.22-metre 
Samuel Oschin Telescope at Palomar Observatory in California. 


NEAR-INFRARED SPECTROMETERS (US$5 MILLION) 


Although transit observations are a good way to determine the radii and 
orbital periods of exoplanets, other techniques, such as spectroscopy, 
are essential for learning more. 

Particularly important is the radial-velocity technique, the planet- 
hunting method that has produced the most hits so far. An orbiting 
planet tugs its star to and fro, generating periodic shifts in the wave- 
lengths of the light the star emits; measurements of these shifts can not 
only allow independent confirmation of an exoplanet'’s existence, but 
also provide estimates of its mass. 

The method presents another good reason to focus on M-dwarfs. 
Earth’s motion around the Sun causes the star to wobble with a radial 
velocity of some 10 centimetres per second over the course of a year — a 
tough signal for any alien astronomers to detect. But ifa planet the size 
of Earth were located in the habitable zone of an M-dwarf, much closer 
to the star, its radial-velocity signature would be a metre per second, 
much easier to see. Unfortunately, M-dwarfs shine most brightly with 
infrared and near-infrared light, so that is the region of the electromag- 
netic spectrum in which planet-hunters must search — but astronomers 
have yet to build the infrared spectrometers required for such precise 
measurements. So only a handful of the myriad M-dwarfs close to the 
Solar System have been surveyed for habitable planets. 

Worse, planet-hunting near-infrared spectrometers are more costly 
than their optical counterparts, owing to basic physics: infrared photons 
don't have enough energy to easily excite electrons in an off-the-shelf 
silicon detector. So the instruments rely instead on 
detectors made from expensive, exotic materials 
such as indium gallium arsenide or mercury cad- 
mium telluride, and must be either cryogenically 
cooled or thermally insulated against background 


For more on 
exoplanets, visit: 
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infrared radiation. The most cutting-edge near-infrared spectrometer 
is ESO’s Cryogenic High-resolution Infrared Echelle Spectrograph 
(CRIRES), which cost some €10 million (US$13.6 million) to build. 
Yet prices are dropping, and several major spectrometers may debut in 
the next few years, if they receive sufficient funding. Among them are the 
Calar Alto High-resolution Search for M-dwarfs with Exo-Earths with a 
Near-infrared Echelle Spectrograph (CARMENES), a German-Spanish 
instrument slated for the Calar Alto observatory in Spain; a near-infrared 
spectropolarimeter (SPIRou) on the Canada—France—Hawaii Telescope 
in Hawaii; and a US project, the Habitable Zone Planet Finder, slated 
for the Hobby-Eberly Telescope at the McDonald Observatory in Texas. 


LASER FREQUENCY COMBS (US$100,000) 


Another hurdle to the progress of M-dwarf planet discoveries is more subtle: 
the need for better ways to calibrate the spectrometers. The minuscule 
spectral-line shifts caused by an orbiting habitable planet can all too easily 
be mimicked by fluctuations in the stability of the instruments themselves. 
The obvious solution is to generate a reference spectrum with which the 
observations can always be compared. But the spectra generally used to 
calibrate optical radial-velocity surveys — those from iodine or a thor- 
ium-argon mix — dont produce usable calibration lines in infrared. 

A number of other elements and mixtures are under investigation for 
calibration. But planet-hunters are most excited about an ultra-high-pre- 
cision technology known as the laser frequency comb. At the core of such 
a device is a laser that emits rapid pulses that can be tuned across a wide 
range of wavelengths. Plotting the frequency of such a pulse train gives 
a distinct series of regular-wavelength peaks that resemble the teeth of a 
comb. When those pulses are fed through a spectrometer and synchro- 
nized with the ticking of an atomic clock, it becomes a powerful calibra- 
tion source for spectroscopic measurements. 

Efforts are under way to test laser combs 
on spectrometers at observatories. In late 
2009 and early 2010, for example, a laser 
comb developed at the Harvard—Smithso- 
nian Center for Astrophysics in Cambridge, 
Massachusetts, was linked to an optical 
spectrometer at the Whipple Observatory 
and used to calibrate spectroscopic meas- 
urements of a known planet-hosting binary 
star, HD 189733. In mid-2010, at the Hobby- 
Eberly Telescope, a comb developed at the US 
National Institute of Standards and Technology 
and mounted on a near-infrared spectrometer 
from the University of Pennsylvania, Philadelphia, obtained radial-veloc- 
ity measurements of the planet-hosting star Upsilon Andromedae. And 
in December 2010, at ESO, a comb from the Max Planck Institute of 
Quantum Optics in Garching, Germany, made radial-velocity measure- 
ments ofan exoplanet that were, for the first time, more accurate than the 
previous champion, thorium-argon calibration. The results of these tests 
are unpublished. If all goes well, combs from each team will grace next- 
generation spectrometers at major observatories in this decade. 


RADIAL-VELOCITY OBSERVATORIES (US$50 MILLION) 


Even armed with laser combs, planet-hunters could still be undone by 
the stars themselves, whose surface motions can masquerade as radial- 
velocity signals. “A star reverberates like a bell, with millions of modes 
of harmonic oscillations covering its surface, a bit like the weird patterns 
you get from putting sand ona vibrating drum head,” says Steven Vogt, 
an astronomer at the University of California, Santa Cruz. “A few of 
these modes dont average out across the surface of the star, and they give 
you oscillations that can show up as noise in your observations.” 

The technique that has emerged to counter such noise sources is to 
average together 10-15-minute time-exposures of the star taken on con- 
secutive nights over a period of weeks. Stéphane Udry, an astronomer at 
the University of Geneva in Switzerland, who hunts for planets at La Silla, 
says that it works. “We have a little sample of ten nearby stars we've begun 


“THE PLANETS ARE 
OUT THERE, ANDITS 


RELATIVELY INEXPENSIVE 
TO GO AFTER THEM.” 
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following in this way, and we've already found planets around three of 
them,” he says. “But a lot of observations are needed, because the stars 
are unlikely to have only one planet. So we have to cover all the potential 
periods for multiple planets, which takes time. As you try to make your 
measurements more precise, it quickly becomes expensive.” 

So expensive, in fact, that Vogt says the best way to reduce the long- 
term cost is to spend more money in the short-term on building radial- 
velocity-dedicated observatories. “The coin of the realm is observing 
nights,” he says. “It’s not new technology; it’s not laser combs or some 
newfangled near-infrared spectrometers that can take advantage of 
M-dwarfs. Take $50 million, which is chump change in the NASA 
regime, build a 6-8-metre telescope with enough light-gathering power 
to reach a large fraction of the nearest M-dwarfs, put a nice spectrometer 
on it and dedicate it to this work every single night of the year. Youd have 
these planets pouring out of the sky” 

Vogt and his colleagues have built a demonstration project, the Auto- 
mated Planet Finder (APF): a 2.4-metre robotic telescope paired with a 
high-efficiency spectrometer at Lick Observatory on Mount Hamilton, 
California. The APE, according to Vogt, is “built and bred only to find 
short-period rocky planets” around nearby stars, including the brightest 
M-dwarfs in the sky. The project is now in its final installation phase, 
with commissioning scheduled for this month. Vogt expects it to rapidly 
discover a bevy of small, rocky worlds. 


EXOPLANETSATS (US$250,000 EACH) 


Radial velocity’s most important role in the future may be helping to 
verify and study promising transit discoveries. “Transit searches are the 
most advantageous technique giving access to terrestrial planets in the 
habitable zones of stars,’ says Udry. “We cannot beat that.” 

Astronomers are already brainstorming 

successors to the Kepler mission, which 

would carry out transit surveys of nearby 
stars looking for worlds with the poten- 
tial for life. In the meantime, a much 
cheaper proposal is the ExoplanetSat pro- 
gramme being developed by Sara Seager, 
an astronomer at Massachusetts Institute of 
Technology in Cambridge, and her team. The 
idea is to build on the existing framework for 
‘CubeSats’ — miniaturized satellites, made up 
of varying numbers of cubes 10 centimetres 
on each side, designed to hitch low-cost rides 
into orbit on rockets launching larger spacecraft. 
Seager’s plan calls for a fleet of dozens of CubeSats, each targeting an 
individual star and containing a small telescope and guidance equip- 
ment. Outside Earth’s atmosphere, which interferes with observations, 
such a payload could detect transiting Earth-sized planets in the habit- 
able zones of nearby Sun-like stars. 

Seager admits that engineering such ‘nano-satellites’ to have the 
necessary stability and thermal control will be challenging. But she 
and her team hope to launch a functional prototype as early as 2012, 
with subsequent satellites launching for as little as $250,000 apiece — a 
bargain-basement price for a space-science mission. “On one hand, this 
seems risky because the probability of finding a transiting Earth-sized 
planet in the habitable zone ofa nearby star is currently estimated at 1 in 
200,” she says. “On the other, these satellites are modular and relatively 
cheap; launching one has low risks and what may be very high returns. 
Basically, this could be MEarth in the sky.’ m SEE EDITORIAL P.5 


Lee Billings is a freelance writer based in New York. 
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Deep-sea vents are underwater hot springs, home to unique life forms and metal-rich minerals. 


Tighten regulations 
on deep-sea mining 


Extracting minerals from sea-floor vents should not go ahead without 
a coherent conservation framework, argues Cindy Lee Van Dover. 


northwestern United States during the 1860s attracted explor- 

ers to the hot mineral springs of the Yellowstone Basin. Soon 

after, speculators moved in intending to fence and claim the land 

containing the hot springs. Instead, by 1872, the Yellowstone geyser 

basin was set aside as the world’s first national park. Remarkably, 

policy-makers in Washington DC, whose only knowledge of Yellow- 

stone was based on photographs, paintings and stories, swiftly saw fit 
to leave this wilderness pristine for future generations. 

In the late 1970s, geologists discovered analogous, mineral-rich hot 


D eposits of gold ore found along the Salmon River in the 


springs in volcanically active areas of the floor of the Pacific Ocean! 
(see map). These deep-sea hydrothermal vents support bacteria that 
use chemicals in the vent fluids to generate cellular energy. The bacteria 
feed luxuriant communities of beautiful and strange invertebrates in an 
otherwise barren seascape. Scientists studying vents have gained insights 
into the cooling of Earth’s interior, ocean chemistry and the extremes at 
which life can exist on Earth and potentially elsewhere in the Universe. 
Some national governments, such as those of Canada, Portugal, Mexico 
and the United States, have introduced marine parks to protect vent 
fields of particular scientific interest within 200 nautical miles of their 
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> coastlines. But most vents are found in international waters, where 
there is little environmental oversight of deep-sea habitats, or in the 
territorial seas of countries with nascent or non-existent conservation 
policies that apply to deep-sea hydrothermal vents. 

With commodity prices on the rise, mining of mineral deposits at 
deep-sea vents looks set to begin in the next few years. As a scientist 
who has studied hydrothermal vents almost since their discovery, 
and as one passionate about the exquisite organisms that thrive there, 
I would prefer that sea-floor hot springs remain pristine — deep-sea 
Yellowstones — untouched by mining. But I recognize that the scien- 
tific values I assign to hydrothermal vents must be weighed against 
other values, including economic ones. 

With mining likely to be inevitable, scientists need to promote 
conservation at every level — from international governance agen- 
cies to individual mining companies. To that end I am working with 
Nautilus Minerals, a deep-sea mining company headquartered in 
Toronto, Canada, undertaking research that informs its environ- 
mental management strategies. In return I am able to tackle research 
questions that would otherwise be out of reach owing to high costs 
of field sampling in the deep sea. Some may consider such an alliance 
a Faustian pact. I disagree. 


DEEP HISTORY 
Proposals were made in the 1980s to extract mineral ores from hydro- 
thermal vents off the coast of Oregon. But questions about technical and 
economic feasibility held up sea-floor mining for more than two dec- 
ades. During this interval, advances in undersea technology — scientific 
and industrial — have yielded increasing access to the deep sea. 

At first it was scientists — American and French — who dominated 
deep-sea exploration and research, ranging the ocean depths in their 
manned submersibles Alvin and Nautile. Next, the oil and gas industry 
pushed into deeper and deeper waters, facilitated by advances in off- 
shore capabilities, and sometimes highlighting new risks and regulatory 
limitations, as seen in the Deepwater Hori- 
zon oil spill. Today many nations — includ- 
ing China, with its Jiaolong submersible that 
made its maiden dive to the bottom of the 
South China Sea in 2010 — operate state-of- 
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research. 


So the technology for seabed mining deep-seavents. 


has matured, and the metals are definitely 

there: deposits of copper-, zinc-, silver- and gold-rich ores have been 
identified at deep-sea vents in regions with moderate seas and close to 
onshore mining infrastructure’. At least two mining companies (Blue- 
water Metals of Sydney, Australia, and Nautilus Minerals) are pushing 
ahead with mining exploration in territorial waters of island nations in 
the southwest Pacific Ocean. Both companies undertook exploration 
expeditions late last year — Nautilus Minerals in the waters of Papua 
New Guinea and Bluewater Metals in Solomon Islands waters. 

Last month, Nautilus Minerals was granted a 20-year mining lease 
by the government of Papua New Guinea for mineral extraction at 
a site known as Solwara 1 in the Manus Basin. The company plans 
to commence open-cut mining of Solwara 1 within the next few 
years, removing mineral ores (and organisms) to an estimated depth 
of 20-30 metres over an area equivalent to about 10 football fields. 
And in July 2011, the International Seabed Authority (ISA), which 
has jurisdiction over mineral resources in international waters, will 
review the first lease applications for exploration of sea-floor deposits 
on mid-ocean ridges. The China Ocean Mineral Resources Research 
and Development Association submitted an application last May for 
exploration of the Southwest Indian ridge, and in late December, Rus- 
sia submitted an application for exploration work on the Mid-Atlantic 
Ridge. Mineral exploitation will not be limited to territorial waters. 

National and international policies for conservation have not kept 
pace with mineral exploration and plans for extraction. Papua New 
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Guinea’s national environment agency, for example, has not yet set 
aside sea-floor vent ecosystems for conservation in any systematic 
manner that might protect biodiversity from the effects of mining. 

Policies regulating human activities in the deep sea can be arbitrary 
and inconsistent. As a case in point, the Food and Agriculture Organi- 
zation of the United Nations lists hydrothermal vents as vulnerable 
marine ecosystems to be protected from regulated fishing. Asa result, 
seamounts in the South Pacific are protected against bottom fishing. 
But mineral extraction, which has the potential to destroy the very 
same habitat, is not prohibited. 

In territorial waters, standard-setting develops in an ad-hoc man- 
ner. For example, Nautilus Minerals is working in partnership with 
scientists to establish effective environmental guidelines and collect 
baseline data’. In some cases, the sampling and analysis have been 
more robust than those undertaken on academic research expeditions. 
Once mining begins, scientists may participate in monitoring and test- 
ing strategies for assessing and mitigating the impacts of mining. 

As part of its mitigation plans, Nautilus Minerals has set aside a 
temporary reserve area of similar size and character to Solwara 1 to 
serve as a possible source for natural repopulation of the mine site. 
Under the terms of its permit from the government of Papua New 
Guinea, Nautilus Minerals is obliged to meet commitments for impact 
mitigation and restoration, and responsible mine closure. 

In international waters, there are gaps — some say chasms — with 
regard to regulation, governance and conservation of special habitats in 
the deep sea, whether they are hydrothermal vents, cold seeps or deep- 
water coral reefs’. Instead of leaving it to chance or to the goodwill of a 
few companies, conservation policies should become an integral part of 
international seabed regulation — before the ISA grants the first explora- 
tion and mining licences. This 15-year-old agency is also responsible for 
establishing environmental regulations to protect the marine environ- 
ment from harmful effects that might arise during resource extraction. 
Some have suggested that lodging leasing and environmental responsi- 
bilities in the same agency is akin to setting the wolf to guard the sheep. 
There can be environmental oversight at the ISA through its Legal and 
Technical Council, which serves as an independent advisory body, and 
nations with strong conservation interests can and should ensure that 
the actions of the ISA take into account conservation objectives. 

As one of the first steps, the International Marine Minerals Society 
(IMMS) presented a Code for Environmental Management of Marine 
Mining (go.nature.com/mte4gq) to the ISA in April 2010. According to 
the ISA, the code, which was an initiative of Nautilus Minerals, is “likely 
to serve as a model for legally binding legislation on marine mining”. 

The IMMS code offers wide-ranging environmental policies for 
the management of commercial mining activities. Yet it falls short of 
providing a comprehensive conservation policy that would system- 
atically protect natural diversity, and ecosystem structure, function 
and resilience, while enabling rational use. The California Marine Life 
Protection Act is an example of one such effort to engage stakeholders, 
scientists, resource managers and members of the public in increas- 
ing the coherence and effectiveness of the state’s marine management 
through the design of Marine Protected Areas’. 

Also last year, multiple stakeholders, with the support of the ISA, the 
Census of Marine Life and other agencies, developed guidelines for 
networks of reserves for chemosynthetic ecosystems, including deep- 
sea hydrothermal vents’. These, or similar guidelines, need to be turned 
into regulations within the ISA or another competent body. Until these 
are in place, wholesale mining of hydrothermal vents is premature. 


UNFINISHED BUSINESS 

There are three scientific reasons for deferring wholesale commercial 
mining until proper conservation plans are enacted. First, there is 
much more to learn about hydrothermal vent systems. After three 
decades of work, researchers continue to find new vent sites in remote 
locations and new species, adaptations, behaviours and microhabitats, 
even in well-known settings. 
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Most deep-sea vents are in volcanically active areas. Many are found in international waters, or in seas 
belonging to countries that are still developing deep-sea conservation policies. 
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Second, there is no strategy in place to assess the cumulative impacts 
of mining. Mining one vent field may be comparable to a volcanic 
eruption or other natural process that wipes out vent communities. 
Active hydrothermal vents are subject to frequent disturbance, includ- 
ing collapse of black smoker chimneys and microearthquake activity. 
The ability of a vent community to recover from such events may 
depend on their frequency as well as their scale. Moreover, scientists 
do not yet understand how vent systems repopulate, or anything about 
the complex dynamics of neighbouring communities. The effect of 
continuous and cumulative mining operations may be very different 
from that of a single event. 

Third, we still don't know how best to mitigate mining activities 
or to restore habitats in the deep sea. Efforts by mining companies 
(such as setting aside a reserve area) during and after extraction could 
conceivably alleviate scientific concerns about cumulative effects. But 
which measures will work, and be affordable, won't be known until the 
mining is complete or until experimental studies are done. 

At this point, I believe a scientific panel would review the current 
knowledge base and mining plans for Solwara 1 favourably — with the 
advice that no further mining be initiated until ecologists understand 
how quickly the mined vent ecosystem recovers and whether the resto- 
ration strategies used by the mining company facilitated recovery. 

Marine research demands patience; expeditions are long and costly, 
and scientific answers slow in coming. However, we cannot be patient 
about effective policies to protect the sea floor. There is an urgent need 
to establish conservation guidelines before mining begins in inter- 
national waters, and to place these guidelines in functioning governance 
and regulatory frameworks. Mining codes alone are not enough. 

In states where seabed exploration is already under way, government 
agencies should act now to comply with global conservation targets, 
such as those adopted by the Convention on Biological Diversity. The 
convention has established scientific criteria to identify ocean areas 


that require enhanced protection, including hydrothermal vents. It has 
called for a global network of comprehensive, representative and effec- 
tively managed protected areas by 2012 and suggests that at least 10% 
of each of the world’s ecological regions be conserved. There is thus an 
international agreement to protect seabed vent ecosystems. 

It is easy to see what would have been lost had Yellowstone been 
turned over to miners instead of park rangers. Kilometres of overlying 
water make it harder to see what would be lost in the deep sea. There 
are creatures of extraordinary beauty down there, exquisitely adapted 
to their environment. Humans may choose to threaten these habitats 
for economic or strategic advantage, and to feed lifestyles that depend 
on relentless demand for minerals and other resources. But we should 
make these choices on the basis of an understanding of what we may 
lose as well as what we may gain. m 
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Al Tombaugh, the son of Pluto’s discoverer, was among those defending its planetary status in 2006. 


The planet that 
never was 


Neil deGrasse Tyson enjoys a passionate and personal 
account of the demotion of Pluto. 


committed a capital crime. The confessed 

‘felon’ in this case is astronomer Michael 
Brown of the California Institute of Technol- 
ogy in Pasadena. In How I Killed Pluto and 
Why It Had It Coming, he describes how his 
discovery of another icy body challenged 
Plutos status as a planet. 

The first seed of Pluto's undoing was sown 
on 13 March 1930, the day its discovery was 
announced by the Lowell Observatory in 
Flagstaff, Arizona. Clyde Tombaugh, a Kan- 
sas farm-boy-turned-astronomer, claimed to 
have found the mythical Planet X predicted by 
his predecessor Percival Lowell. Or had he? 

If you are looking for something already 
named Planet X and you find it, then surely 
that object will get classified as a planet. And 


I tis not often that people assert they have 


thus, Pluto was first reported to be at least as 
large and as massive as Earth. This was based 
not on a measurement, but on the assumption 
that Pluto’s gravity was sufficient to perturb 
Neptune’ orbit in exactly the ways observed. 

Pluto orbits sufficiently far from the Sun 
that all but the most advanced telescopes on 
Earth see it as a smudge of light. One way to 
measure its size is by occultation: from the 
time it takes Pluto to cross our line of sight to 
a distant star, astronomers can easily derive a 
lower limit for how big Pluto must be. 

The problem was, 


everyattempttomeas- DNATURE.COM 
ure Pluto’s shadow _ Forafictional take 
showed no dimming _ onthe discovery of 
of the background _ Pluto, see: 
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strangely transparent 
or it was much smaller 
than thought. Esti- mt 
mates for the planet's =" 
size kept shrinking 
over the decades, with 
a rapidity that led 
some wags to extrap- 
olate (complete with 


How I Killed 
Plute 
and 


a fitting function and 
graph) the exact date How! Killed Pluto 
at which Pluto would — and Why It Had It 
disappear completely, Coming 

MIKE BROWN 


Pluto's size was not 
settled until the late 
1970s, when our view 
of its orbit allowed 
James Christy of the US Naval Observatory 
in Flagstaff to discover its moon, Charon. 
From orbits and eclipse measurements, 
planetary scientist Richard Binzel and his 
collaborators discerned the masses and sizes 
of both bodies. Pluto was tiny. An orb meas- 
uring 2,300 kilometres across, it was not only 
much smaller than Earth, it was smaller than 
seven moons in the Solar System, with a mass 
equivalent to a mere 17% of our Moon. 

Puny Pluto never justified its designation 
as the mythical Planet X. Had its true size 
been known to Tombaugh, he might never 
have classified it as a planet. The very data 
that led to its discovery were also challenged 
in 1993, when E. Myles Standish found from 
a re-analysis of archival observing logs at 
the US Naval Observatory that nothing was 
wrong with Neptune’s orbit all along. There 
was no need for a perturbing Planet X. So 
what was Pluto? 

Size alone should not bea principal factor 
in defining a planet — ifit were, Earth could 
lose out too. For instance, Jupiter is 11 times 
wider than Earth, but Earth is only 5.5 times 
wider than Pluto. So Jovians might think 
Earth puny. For many astronomers, physi- 
cal and orbital characteristics matter greatly, 
but in the hearts and minds of some, size still 
matters. That is where Brown enters the pic- 
ture. In How I Killed Pluto and Why It Had It 
Coming, he explains where Pluto sits in the 
hierarchy of the Solar System and whether it 
deserves to be called a planet at all. 

We learn about the 1992 discovery by 
astronomers David Jewitt and Jane Luu 
of the Kuiper belt that lies beyond Nep- 
tune — thousands of small, frozen objects 
that never coalesced from gravitational 
attraction. Pluto lives among them. Brown 
reasoned, as others had before, that if Pluto 
is large among these icy bodies yet orbits on 
the inner edge of the Kuiper belt, then larger 
objects could exist farther out that escape 
detection. He set out to find them. With the 
telescopic muscle of Caltech’s observatories 
and plenty of youthful ambition, Brown was 
poised to make the discovery that trans- 
formed his life and the fate of Pluto. 


Spiegel and Grau: 
2010. 288 pp. $25 


Part memoir and part planetary saga, 
Browns book invites you into his office, 
his home and his head. The account of his 
hard work, long hours and lost sleep reveal a 
dedicated researcher on a mission. He 
reflects on love and passion, including a 
charming account of how he met, courted 
and married his spouse. We learn about the 
birth of his daughter and how these domestic 
elements pierce his life as a scientist. 

Brown's confessed crime is his 2005 
discovery of Eris, an icy Kuiper-belt object 
that, by early estimates, was slightly larger 
than Pluto. What should we call it? If Eris is 
not a planet then it must drag Pluto down 
with it into the ranks of non-planethood. If 
we call it a planet, then Brown becomes one 
of only four people to have discovered one. 
Even he is too modest to claim that his name 
should hang alongside William Herschel, 
discoverer of Uranus, or Johann Gottfried 
Galle, discoverer of Neptune. 

Actually, Pluto's planet status had been 
percolating for years. Diminutive size was 
only one of many factors in its demotion. 
Pluto's oddly tipped, elongated orbit and its 
icy constitution also raised eyebrows. With 
the discovery of the Kuiper belt, the need for 
an official decision grew urgent. In August 
2006, at the triennial meeting of the Interna- 
tional Astronomical Union (IAU) in Prague, 
a formal vote was taken on the definition of 
a planet. 

What emerged was simple yet devastat- 
ing to Pluto-lovers. Does the body mainly 
orbit the Sun? Is it large enough to pull its 
own mass into a sphere? Is its gravity strong 
enough for it to have (mostly) cleared its 
orbit of debris? Answer yes to all three and 
it’s a planet. Given the known existence of the 
Kuiper belt, Pluto (and Eris) would fail the 
debris-free orbit criterion. And soa new term 
was invented for round objects that orbit in 
crowded places: dwarf planet. 

Measurements of Eris’s size from a Novem- 
ber 2010 occultation may leave Eris slightly 
smaller than Pluto, instead of slightly larger as 
Brown had previously determined. Although 
this revelation has resurrected the efforts of 
some Pluto defenders, the IAU definition 
remains robust against arguments of size. 

So although Brown did not kill Pluto all 
by himself, he is guilty of providing wood 
and nails to construct its coffin. And my 
museum colleagues and I have someone to 
whom we can forward the hate mail we still 
get from Pluto-loving schoolchildren. = 


Neil deGrasse Tyson is an astrophysicist at 
the American Museum of Natural History, 
New York, USA. He is author of The Pluto 
Files and host of a PBS NOVA television 
programme of the same name. 


Further reading accompanies this article online at 
go.nature.com/c9ggk9. 


Books in brief 


Beyond Humanity?: The Ethics of Biomedical Enhancement 
Allen Buchanan OXFORD UNIVERSITY PRESS 256 pp. $25 (2011) 
Since humans developed tools, we have sought to improve 

our performance through technology. Enhancements using 
biotechnologies should be seen in the same evolutionary context, 
argues philosopher Allen Buchanan. Increasing our memory, 
cognitive power, stamina or resistance to disease using drugs and 
genetic editing offers sufficient benefits to our species that we 
should set aside objections. He urges that evolutionary biology 
should be included in ethical debates about biotechnology and 
enhancement. 


Alone Together: Why We Expect More from Technology and Less 
from Each Other 

Sherry Turkle BASIC BOOKS 384 pp. $28.95 (2011) 

The illusion of companionship fostered by technology is the focus of 
sociologist Sherry Turkle’s latest book. From Facebook to robots, she 
examines how social networks give us ‘friends’ without the demands 
of intimacy, and how virtual environments allow us to overcome risk 
without consequences. Despite taking increasing hold of our lives, 
she argues, computers and robots will ultimately result in isolation, 
reduced privacy and diminished social skills. Yet she hopes that, by 
asking new questions, the young will overcome these downsides. 


Virtually You: The Dangerous Powers of the E-Personality 
Elias Aboujaoude W. W. NORTON 349 pp. $26.95 (2011) 
Just as the persona we present to our work colleagues and our 

| family differs, psychiatrist Elias Aboujaoude argues that we show 
al a separate character online. From studying patients who have 
become mentally disturbed through excessive Internet use, he 
examines the construction of this e-personality, which reveals itself 
in the style of our e-mails, the users we associate with in our social 
networks and our online shopping habits. The impatient, urgent 
and unfocused nature of Internet usage also seeps into our offline 
world, he argues. 


World Wide Mind: The Coming Integration of Humanity, Machines, 
and the Internet 

Michael Chorost FREE PRESS 256 pp. $26 (2011) 

Having relied since 2001 on bionic ear implants for his hearing, 
science writer Michael Chorost offers a personal account of the 
borderline between humans and machines. After exploring the 
technologies that might be used to fix or enhance our bodies, 

with a focus on brain implants, he argues that such technologies 
need not depersonalize us. As well as overcoming physical 
problems, embedded brain chips might one day transform human 
communication by literally plugging us into the World Wide Web. 


Kingpin: How One Hacker Took Over the Billion-Dollar Cybercrime 
Underground 

Kevin Poulsen CROWN 288 pp. $25 (2011) 

Hacker-turned-journalist Kevin Poulsen investigates cybercrime in 
his latest book. He spotlights a notorious figure who took over a giant 
online criminal network and siphoned off millions of dollars from the 
US economy. Sought by the FBI worldwide, the hacker turned out 

to be security consultant Max Butler. Poulsen portrays both sides of 
the story and exposes the range of ongoing frauds, from phishing to 
Trojan viruses to counterfeiting. 
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The plantswoman who 
dressed as a boy 


The tale of the first female to sail around the world 
deserves a more accurate telling, says Sandra Knapp. 


Linnaeus, who established the system by 

which we name animals and plants today, 
posed in an authentic Sami costume from 
Lapland not realizing that it was a woman’s 
outfit. Another case of eighteenth-century 
botanical cross-dressing is related in Glynis 
Ridley’s book. French botanist Jeanne Baret 
dressed as a boy to gain passage on explorer 
Louis-Antoine de Bougainville’s voyage to 
circumnavigate the globe. 

The intrepid Baret saw more biodiversity 
than the notoriously stay-at-home Linnaeus, 
yet she is not well known. In The Discovery 
of Jeanne Baret, Ridley purports to resurrect 
her name and accomplishments. Sadly, the 
author does not convincingly deploy the few 
facts available, so the book feels more like 
fiction than non-fiction, a novel whose char- 
acters are real people from history. 

Explorers in the eighteenth century were 
overwhelmingly male, so the tale of a young 
woman who dressed as a man to see the world 
has huge appeal. De Bougainville set off in 
1766 on the first French circumnavigation 
of the globe — also the first by any nation to 
include a professional naturalist to record the 
plants and animals of new lands. 

The naturalist was Philibert Commerson 
(or Commer¢on), a friend of Voltaire and cor- 
respondent of Linnaeus. He was accompanied 
by Baret, a peasant woman from Burgundy 
who was his housekeeper and lover. Ridley 
weaves a tale of Baret’s early life from wafer- 
thin evidence comprising a few official docu- 
ments, such as birth certificates, and a book 
of medicinal plants attributed to Commerson, 
which Ridley speculates was written by Baret. 
She is portrayed as a ‘herb-woman’ whose 
practical knowledge was useful to Commer- 
son — his teacher rather than his assistant. 

That Baret had the guts and determina- 
tion to overcome gender and class barriers 
is without doubt. But Ridley does both Baret 
and Commerson a disservice by painting 
the latter as a parasitic ne’er-do-well while 
Baret comes across as a hard-working serv- 
ant. I feel that Ridley 


E 1737, the great Swedish botanist Carl 


misrepresents what ONATURE.COM 
musthave beenatruly Tracy Chevalier 
collaborative partner- _ on Victorian fossil 
ship in the discov- _ hunter Mary Anning: 
ery of botanical and _ go.tiature.com/42Iter 


zoological wonders. 
Why else would Baret 
have stayed with Com- 
merson in Mauritius 
and Madagascar after 
they left the expedi- 
tion, only parting 
from him at his death 
in 1773? 

How Baret con- 
cealed her gender for 
so long in the close 
confines of the ship is 
unclear. She must have 
bound her breasts, and 


The Discovery of 
Jeanne Baret: A 
Story of Science, 
the High Seas and 
the First Woman 
to Circumnavigate 
the Globe 


GLYNIS RIDLEY Ridley s version has her 
Crown: 2010. hiding out in Commer- 
304 pp. $25 son’s cabin. Women 

disguising themselves 


as men was not unknown at the time — in 
1745, Hannah Snell dressed as a man and 
enlisted in the British Marines, serving for 
five years and completing tours of India; 
her book The Female Soldier, published 
in 1750, was hugely popular. 

De Bougainville’s official 
accounts of the voyage place 
the unmasking of Baret’s gen- 
der in Tahiti, where the expedi- 
tion stayed in April and May of 1767. 
Ridley, however, consulted previously unused 
journals and official documents and presents 
a different tale — predatory, sex-mad sailors 
anda lone vulnerable woman left unprotected 
by her partner. The journal kept by the ship's 
surgeon Francois Vives is dripping with sex- 
ual innuendo; his account of the discovery of 
Baret’s true gender is corroborated by other 
diaries. Ridley weaves a story of Baret’s vio- 
lation and subsequent pregnancy from his 
coy references to “concha veneris” (the Venus 
shell) and to the “Jeanneton” of French folk- 
song who loves her attackers. Vivés’ account 
puts Baret’s unmasking and violation on the 
shores of Papua New Guinea in June. 

This makes compelling reading, but I 
would be more inclined to take the history 
seriously if Ridley had not got the scientific 
aspects so wrong. It is easy to check how many 
plants Commerson and Baret collected, and 
from where. Although Commerson never 
made it back to France before his death in 
Mauritius, the collections did. A search under 
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his name in the database of plants held at the 
National Museum of Natural History in Paris 
finds 1,735 specimens and their countries of 
origin: 234 from Madagascar, 144 from Mau- 
ritius, 84 from Brazil and so on. Most have 
on them old, handwritten labels with short 
descriptions, perhaps jotted in the field. 

Ridley maintains, for instance, that 
Commerson and another member of the 
expedition, the Prince de Nassau-Siegen, 
collected no plants during a short stay on 
Java. The database differs — 50 specimens 
were retrieved, many of them forming the 
basis for new species. To highlight such inac- 
curacies might seem pedantic, but Ridley’s 
story revolves around Baret struggling with 
heavy loads of plant presses, vials, nets and 
jars while Commerson swans around. 

I, too, collect plants and carry parapherna- 
lia — heavy work, but great fun. Commer- 
son and Baret encountered a huge wealth of 
diversity. Commerson wrote of Madagascar: 


Botanist Jeanne Baret depicted en travesti. 
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“I. can announce to naturalists that this is 
the true Promised Land. Here nature cre- 
ated a special sanctuary where she seems 
to have withdrawn to experiment with 
designs different from those used any- 
where else. At every step one finds more 
remarkable and marvellous forms of life” 

Ridley describes how Baret discov- 
ered Bougainvillea in the forests of Rio de 
Janeiro using the doctrine of signatures, 
a medieval method by which herbalists 
attributed curative powers to plants on 
the basis of their appearance — a walnut 
was good for brain trouble, red things for 
wounds. Commerson had an ulcerating 
sore on his leg, so Ridley writes of Baret 
searching frantically for a cure, only to find 
it in the red bracts of Bougainvillea hold- 
ing a pea-like pod that reminded her of the 
red-flowered runner beans from home. 
However, Bougainvillea does not have 
fruits like a pea, nor do the notes on the 
specimen mention any medicinal value. 

In his notes, Commerson honoured 
high-ranking members of the expedition 
— its leader is commemorated in Bougain- 
villea, Nassau-Siegen in Nassauvia — but 
he did not publish these names. Bougain- 
villea was formally described in 1789 by 
Antoine Laurent de Jussieu, a French bot- 
anist who used Commerson’s specimens 
and notes. Commerson also proposed the 
name Baretia for a Malagasy tree. 

Ridley maintains that Commerson was 
an arrogant man who named things for 
himself. Yet the International Plant Names 
Index shows 119 species of flowering 
plants named in his honour — by others. 
None is noted as ‘commersonii’ on its orig- 
inal label. Commerson’s Baretia was never 
published, not because someone wanted to 
do Baret down, but because it was found, 
on the specimen’s return to Paris, that the 
genus already hada name. 

After Commerson died, Baret married 
a French officer, Jean Duberna, on Maur- 
itius and returned to France in 1774. She 
was awarded a state pension from 1785 in 
recognition of her bravery and contribu- 
tions. She was not forgotten, although she 
never practised botany again. 

Science was as collaborative then as it 
is now, but women’s contributions were 
often overlooked in favour of those of 
male colleagues — a trend that continues 
today. Baret and other neglected contribu- 
tors deserve recognition, but she does not 
need to be cast as a victim to be seen as 
a success, or her undoubted accomplish- 
ments overinflated. She, and women 
scientists in general, deserve better. m 


Sandra Knapp is a botanist at the 
Natural History Museum, London 
SW7 5BD, UK. 

e-mail: s.knapp@nhm.ac.uk 
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The mind’s ability to adapt suggests that it can cope with our wired world — for better or worse. 


Browsing and the brain 


Two books reach opposite verdicts on how the Internet 
affects us, find Daphne Bavelier and C. Shawn Green. 


henever a new technology reaches 
a tipping point of popularity, ques- 
tions soon follow about its effects 


on society. The rise of the Internet has pro- 
voked two books probing its impact on the 
human brain. The fact that the authors reach 
opposite conclusions, despite relying on the 
same scientific evidence, underscores how lit- 
tle research has been done on this topic. 
Nicholas Carr’s The Shallows laments the 
possibility that long-term Internet exposure 
will sap us of our capacity for contemplation. 
At the base of his argument is the fact that 
the human brain is remarkably plastic. Carr 
makes this point compellingly using a mix- 
ture of historical anecdotes and interviews 
with experts in the neuroplasticity field, such 
as Michael Merzenich and Eric Kandel. 
Having established that brains are 
constantly reshaped by experience, Carr 
argues that changes induced by Internet use, 
such as greater brain activation during web 
browsing, may not be in our best interests. 
If the brain adapts completely to the frenetic 
nature of the Internet, he warns, we may lose 
our capacity for absorbing practices such as 
reading a book. He worries that we may lose 
the very essence of what makes us human. 
Nick Bilton’s I Live in the Future is much 
more optimistic. Humming with enthusi- 
asm for the continuing Internet revolution, 
he argues that social and cognitive changes 
are an inevitable consequence of any major 
technological advance and that our new 
abilities cannot be put back in the box. 


The Shallows: What the Internet is Doing to 
Our Brains/How the Internet is Changing the 
Way We Think, Read and Remember 
NICHOLAS CARR 

W. W. Norton/Atlantic Books: 2010. 276 pp./384 pp. 
$26.95/£17.99 


| Live in the Future & Here’s How it Works: 
Why Your World, Work, and Brain are Being 
Creatively Disrupted 

NICK BILTON 

Crown: 2010. 304 pp. $25, £16.99 


Such tension is to be expected whenever 
new forces enter society. By analogy, Carr 
discusses historical fears that the written 
word would act as a replacement for mem- 
ory, resulting in humans that were ‘shallower 
thinkers. Bilton notes early worries that the 
freedom of travel offered by the railway would 
result in weakening moral standards. Both 
books review suspicions that most people 
would prefer to listen to a book than to read 
one, leading to concerns that the invention of 
the phonograph would kill the art of writing. 

Is the Internet different? Bilton and Carr 
rely on the same scientific facts to argue 
persuasively for opposite positions. For exam- 
ple, functional magnetic resonance imaging 
studies show that Internet searches activate a 
larger network of brain areas than does sim- 
ple text reading. Web browsing also requires 
additional types of mental processing — 
evaluating hyperlinks to make navigational 
decisions and filtering photos, videos and 
menus. As a result, brain activation is greater 
during Internet searches in people who are 
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‘net savvy than in those who are ‘net naive. 

These findings cannot answer the ques- 
tion of whether such changes are good 
or bad. Conclusions are coloured by the 
authors’ values. Bilton treats the adaption 
of the ‘net savvy’ as positive: “the brains 
were learning, benefiting from prac- 
tice and experience”. Carr comes to the 
opposite conclusion: “When it comes to 
the firing of our neurons, it’s a mistake to 
assume that more is better” 

Part of the problem is the paucity of 
scientific studies on the effects of modern 
technologies on the brain. It is a testament 
to both authors’ skills that they were able to 
produce entire books on works so sparse. 
Unfortunately, to fill the pages, they lump 
information into categories that are too 
diverse to be useful. For example, both treat 
the use of all Internet technology — web 
browsing, web searching, texting, tweet- 
ing, video games and so on — asa single 
activity, despite the fact that such variety is 
unlikely to have one distinct effect. As with 
food, the effects of technology will depend 
on what type of technology is consumed, 
how much and for how long. 

History suggests that technology 
does not change the brain’s fundamental 
abilities. The general principles of brain 
organization have not changed for thou- 
sands of years — probably since the rise of 
language. Major technological advances 
do not create de novo brain structures. 
They do, however, take advantage of the 
cognitive flexibility of the human mind. 

With each new technological develop- 
ment, we see a shift in the cognitive abili- 
ties and brain functions that society values 
most. The advent of writing systems, so 
celebrated by Carr, devalued the role of 
oral memorization through storytelling 
as cherished by the Greeks. Great orators 
such as Socrates would have lamented that 
Carr has lost the memory skills necessary 
for passing on knowledge through stories 
to future generations. Yet he has gained 
other skills by entraining alternate brain 
networks for reading and text analysis. 

Just as it was difficult to say at the time 
whether the advent of writing was good 
or bad, a value judgement of the effect of 
the Internet is impossible. But it is a trib- 
ute to neural plasticity that, with each new 
technological development, our brains 
adapt — for better or for worse. m 


Daphne Bavelier is a professor in the 
Department of Brain and Cognitive 
Science at the University of Rochester, 
New York 14627, USA. 

e-mail: daphne@cvs.rochester.edu 

C. Shawn Green is a cognitive scientist 
in the Department of Psychology at the 
University of Minnesota, Minneapolis, 
Minnesota 55455, USA. 
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Abstract relativity 


A Paris exhibition contrasts 1920s depictions of the fourth 
dimension, find Stefan Michalowski and Georgia Smith. 


he birth of modern physics a century 

ago fired artistic as well as scientific 

imaginations. This can be seen in 
the Pompidou Centre’s current exhibition 
of abstract art, covering Dutch painter Piet 
Mondrian and the De Stijl group, led by 
another Dutchman, Theo van Doesburg. 

A series of canvasses illustrates the 
evolution of abstract techniques, from the 
soft contours of impressionism to the spare 
geometry of cubism. “We arrive at a portrayal 
of other things, such as the laws governing 
matter,’ Mondrian wrote. Cubist techniques 
were inspired, in part, by the multi-dimen- 
sional mathematics of Henri Poincaré and 
his contemporaries. 

Most of the exhibition is rightly devoted to 
Mondrian and the devel- 
opment of his recogniz- 
able mature style. From a 
minimal toolbox of visual 
elements — white canvas, 
black lines and simple 
blocks of red, yellow or 
blue — emerge geometric 
compositions of startling 
intensity and elegance. 

Mondrian was deeply 
influenced by theosophy, 
a spiritual movement 
grounded in ancient texts 
that was bent on uncover- 
ing universal truths in art, 
religion and science. He 
penned reams of theory as to why his abstract 
style was the appropriate expression of these 
“great generalities” for modern times. 

A quiet introvert from a Calvinist family, 
Mondrian became a mentor to van Doesburg, 
by contrast a flamboyant young painter who 
had three wives and many artistic cliques in 
his short life (he died aged 47). When van 
Doesburg moved to Paris in 1923, the two 
men worked closely: their canvasses form a 
dialogue as each sparked fresh innovations 
from the other. But their intense relationship 
exploded a year later — and one of the flash- 
points was the theory of relativity. 

The public learned about Albert Einstein's 
discoveries after the First World War, when the 
solar eclipse of 1919 confirmed general relativ- 
ity by showing that gravity can bend light. In 
Paris, space-time became a catchword in avant- 
garde circles. Artists from futurists to Dadaists 
latched on to the new ideas. Van Doesburg 
had already flirted with spatial geometry in 
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Theo van Doesburg’s use of diagonals is 
symbolic of his quarrel with Piet Mondrian. 


Mondrian/De Stijl four dimensions: the 
Centre Pompidou, exhibition includes 
Paris. 


some of his tesseracts, 
projections on paper 
of four-dimensional 
cubes. Then, in the 1920s, he began trying to 
evoke time and change — four-dimensional 
space-time — in his paintings. 

Mondrian rejected van Doesburg’s attempt, 
and the two split over it. Symbolic of their rift 
was van Doesburg’s use of dynamic diagonal 
lines, which contrasted with Mondrian’ strict 
vertical and horizontal grids. But the quar- 
rel went deeper than diagonals: Mondrian’s 
doggedly developed style had become too 
much ofa constraint for his former coterie. 

The De Stijl artists wanted to remake the 
human environment 
by designing furniture, 
buildings and cities 
based on their primary- 
coloured, idealized 
structures. Van Doesburg 
experimented with archi- 
tectural designs and films 
incorporating the fourth 
dimension. Some of these 
products are displayed in 
the exhibition, but the 
role of the fourth dimen- 
sion is not clearly shown 
or explained. The artists 
themselves do not always 
seem to have grasped the 
difference between a fourth dimension in 
space versus one in time. 

As the artists tried to incorporate the new- 
found laws of physics in their expressions of 
absolute truth about the Universe, history 
ambushed them. Their comrades in abstrac- 
tion were soon brutally dismissed by the 
Soviet and Nazi authorities. Einstein helped to 
pull the rug out from under their depictions of 
the ‘absolute’ by dissolving special relativity’s 
neat geometries into quantum theory's fuzzy 
clouds of probability. But Mondrian’s precise 
vision, with its subsumed scientific borrow- 
ings, continues to intrigue and delight. m 


Until 21 March 2011. 
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ORRESPONDENCE 


Negative results 
need airing too 


The problem of the invisibility 
of negative results is underlined 
by the media storm over a 

paper supporting extrasensory 
perception being published in 

a reputable psychology journal 
(see The New York Times, 

5 January 2011). Although 
individual reports might be 
statistically valid in isolation, 
their conclusions could still be 
questionable — other test results 
of the same hypothesis must also 
be taken into account. 

Say a study finds no statistically 
favourable evidence for a 
hypothesis at the predetermined 
significance level (P=0.05, for 
example) and, like most with 
negative results, it is never 
published. If 19 other similar 
studies are conducted, then 20 
independent attempts at the 
0.05 significance level are, by 
definition, expected to give at 
least one hit. A positive result 
obtained in one of the 19 studies, 
viewed independently, would 
then be statistically valid and 
so support the hypothesis, and 
would probably be published. 

Statistical corrections are 
routinely made for multiple 
testing within a study, but they are 
important across studies too. The 
difficulty lies in determining the 
number of parallel investigations 
of the same hypothesis. Perhaps 
different disciplinary research 
societies could help bring these 
covert experiments to light. 
Nitin Gupta, Mark Stopfer 
NICHD, National Institutes of 
Health, USA. 
nitin.gupta2@nih.gov 


Think bigger for 
conservation 


The US initiative to ‘think big’ 
about landscape-conservation 
cooperatives is an imaginative 
approach to conserving species 


in the face of climate change 
(Nature 469, 131; 2011) — but 
thinking needs to be bigger still. 
Because climate change is 
likely to shift entire biomes, 
we urge proponents to include 
the entire continent as a 
management area, with flexible 
borders between particular units. 
We suggest that such 
cooperatives should collaborate 
with and learn from other large- 
scale conservation ventures, 
such as the International Model 
Forest Network — an integrated 
resource-management system 
that has operated globally since 
1992 — and Natura 2000, in 
which different sectors and 
agencies are collaborating across 
Europe to conserve biodiversity. 
Cooperation between 
agencies at various levels 
and geographical locations 
could then be tailored to 
meet particular conservation 
requirements. 
Malgorzata Blicharska, 
Grzegorz Mikusinski Swedish 
University of Agricultural 
Sciences, Sweden. 
malgorzata. blicharska@slu.se 


Research type can 
affect citation rate 


You do not mention possible 
confounding factors in your 
discussion (Nature 468, 1011; 
2010) of the reported positive 
effect of first and last author 
geographical proximity on paper 
citations (K. Lee et al. PLoS ONE 
5, €14279; 2010). 

One is that these are 
biomedical research papers. This 
field has many different author- 
sequence conventions and 
citation cultures. 

In basic research, the first 
author on a publication is 
typically a PhD student and 
the last author is his or her 
supervisor. The papers come 
from closely knit research 
groups, especially in molecular 
biology, and tend to have zero 


distance between the first and 
last authors, and to be cited 
more frequently than clinical 
research papers. 

By contrast, clinical research 
projects typically have no 
clear hierarchical structure 
among collaborators, and often 
apply alphabetical ordering of 
co-authors. Hence, the type 
of research could also explain 
the positive correlation you 
discuss. 

The challenge in training 
researchers to collaborate 
on publications is to finda 
balance between face-to-face 
discussion and the use of new 
communication technologies. 
Henk Moed Elsevier, the 
Netherlands. 
h.moed@elsevier.com 


Controversy over 
GM maize in Peru 


Researchers from the 
Peruvian National Institute 
for Agricultural Innovation 
(INIA) — which has been 
enforcing national and 
international policy on biosafety 
in agriculture since 1999 — 
have investigated claims that 
genetically modified maize 
(corn) is being farmed in the 
Barranca valley north of Lima 
(see go.nature.com/ijkpkz). 
The INIA analysed the 
source and quantity of maize 
imports, records of seed 
cultivars, their genetic diversity 
and planting location. Samples 
were also tested from the 
Pativilca River basin — the 
main river in Barranca and its 
neighbouring valleys. These 
came from maize fields, local 
markets, a local collecting 
facility and seed companies 
that sell poultry feed. 
Evidence of transgenes 
was discovered in only some 
of the poultry grain samples 
(full details are available in 
Spanish at go.nature.com/ 
ikgyqj). This finding is not 


surprising. Peru imports about 
1.5 million tonnes of maize 
grain annually — mainly for 
animal feed — from Argentina 
and the United States, where 
genetically modified maize is 
widely grown. 

We believe that the Barranca 
region today is unlikely to 
bea primary centre of maize 
diversity. However, farmers 
there may be growing maize 
hybrids and other cultivars that 
have seeds of foreign origin. 
Luis Fernando Rimachi 
Gamarra, Jorge Enrique 
Alcantara, Rodomiro Ortiz 
Instituto Nacional de Innovacion 
Agraria, Peru. 
rodomiroortiz@gmail.com 


Self-plagiarism in 
music and science 


Composers are much more 
relaxed about self-plagiarism 
than scientists. It was practised 
by the best: take Bach’s Christmas 
Oratorio, which recycles several 
of his secular cantatas, and 
Mozart’s Mass in C Minor, 
which was transformed into his 
Davidde Penitente. 

As for Handel, he was prone 
to reproducing his own and his 
colleagues’ music with equal 
nonchalance. His love duet ‘No 
[pause] di voi non vo fidarm?’ 
becomes ‘For [pause] unto us a 
child is born’ in Messiah. Same 
music, different atmosphere. 

Some scientists might also 
defend self-plagiarism on 
the grounds that the data are 
the same but the conclusions 
are not. Even my venerable 
professor of biochemistry, 
when I chided him for setting 
his students the same exam 
questions he had asked us 
20 years before, replied tersely, 
“The questions are the same, the 
answers are different.” 

Renato Baserga Kimmel Cancer 
Center, Pennsylvania, USA. 
r_baserga@kimmelcancercenter. 
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OBITUARY 


Eugene Goldwasser 


(1922-2010) 


Discoverer of the hormone that regulates the production of red blood cells. 


ugene ‘Gene’ Goldwasser made 
Be of the outstanding advances 

in twentieth-century biomedicine. 
Through decades of effort, he purified and 
initially characterized the properties of the 
major hormonal regulator of red-blood-cell 
production — erythropoietin (EPO) — an 
advance that rivals the discovery of insulin 
in its importance. Goldwasser’s work was 
instrumental to the large-scale production of 
recombinant EPO, which since the late 1980s 
has been used to induce red-cell formation, 
especially in anaemic patients with chronic 
renal disease. 

Hundreds of thousands of people now 
benefit each year from this therapy, which 
has a multibillion-dollar annual mar- 
ket. Goldwasser himself — who died on 
17 December 2010, from renal complica- 
tions associated with prostate cancer — did 
not become rich. He was more captivated by 
the science than the money. Today, studies of 
EPO continue to provide breakthroughs in 
cytokine and receptor biology. 

Goldwasser was born in Brooklyn, New 
York, in 1922, but the Depression forced his 
family to close their clothing business and 
move to Kansas City. In high school, Goldwass- 
er’s scientific interests were already keen, and 
landed him a scholarship at the University of 
Chicago, Illinois, for his undergraduate degree 
in biological sciences. Following two years 
of US army service as a biochemist work- 
ing on anthrax at Fort Detrick in Frederick, 
Maryland, Gene returned to the University 
of Chicago and completed his PhD in bio- 
chemistry in 1950. As a postdoctoral fellow 
in Copenhagen he trained in cytophysiology 
— the study of the biochemistry of cells. In 
1952 he returned to the University of Chi- 
cago. His initial studies there, at the Argonne 
Cancer Research Hospital, focused on 
new approaches to treating leukaemia. 


HORMONE HUNT 

The first clues to EPO’s existence were 
found more than a century ago. This early 
work relied on highly variable measures of 
the number of mature blood cells, so the 
existence of such a substance remained con- 
tentious. In the 1940s, researchers had more 
precise measures of red-cell precursors, and 
named the proposed substance EPO; by the 
1950s, more definitive results came from 
experiments using surgically conjoined 
rats and acutely anaemic animals. Still, no 
one knew exactly what was stimulating 


the formation of red blood cells. 

In 1953, Goldwasser turned his attention 
to the EPO enigma. From 1956 to 1959, he 
worked aggressively and published 15 EPO- 
related papers, at a time when not everyone 
was convinced that EPO even existed. These 
included seminal studies published in Nature 
and Science showing that the kidney was a 
prime site of EPO production, and that cobalt 
could induce EPO production. He also devel- 
opeda sensitive assay for EPO activity using 
a radioactive isotope of iron, which enabled 
the development of purification strategies. 

Goldwasser originally estimated it would 
take him about six months to purify EPO. 


It took 17 years. EPO from any source 
existed in only vanishingly low quantities, 
and it tended to degrade during attempts at 
purification. In early studies, Goldwasser 
used litres of plasma from anaemic sheep, 
but even increasing the EPO concentration 
1-million-fold didn’t produce a pure sample. 
He tried many sources, and had some luck 
with urinary protein from anaemic Argen- 
tine patients with hookworm. A major 
breakthrough came on Christmas morning 
in 1975, when Goldwasser’s colleague Takaji 
Miyake of Kumamoto University, Japan, 
hand-delivered a special gift: processed 
protein from more than 2,500 litres of urine 
from patients with aplastic anaemia. Others 
might have preferred fruit cake. Goldwasser 
and his colleagues used this unique gift to 
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finally purify, in 1977, about 8 milligrams of 
human urinary EPO. 

By the 1980s, other researchers began 
entering the field that Goldwasser had 
opened up. I joined Children’s Hospital 
Boston and Harvard Medical School in 
Boston, Massachusetts, to work on EPO as 
a postdoctoral fellow. When I met Gene for 
the first time at a conference, he smiled and 
said: “It’s nice to have you in the field, but 
should you not be in Chicago?” — working 
with him, in other words. 

Goldwasser was captivated by EPO’s 
basic science, but was also keenly aware of 
its potential clinical importance. In 1978, 
for example, he was involved with an early 
small clinical trial. At his university, he had 
initiated some paperwork for an invention 
disclosure, but this did not move forward. In 
the early 1980s, the idea of commercializing 
such work was quite new, as were the first 
biotechnology companies. 

Goldwasser did agree to consult for 
a biotechnology company — Amgen. 
During the 1980s, some of the peptide 
fragments of human EPO were sequenced 
—a first step towards cloning the human 
EPO gene and producing clinically use- 
ful quantities of recombinant protein. As 
testimony to his excitement, Goldwasser 
unwittingly presented one such sequence at 
a national meeting. Fortunately for Amgen, 
the sequence turned out to have several 
mistakes, and no one else gained from the 
information. Amgen eventually took the 
lead in commercializing EPO. 

Gene truly had a lifelong fascination 
for science. Had he chosen, he could have 
made great contributions in other areas of 
biochemistry: in 1953, for example, he pub- 
lished impressive work in Nature on nucle- 
otide biosynthesis. From 1994 to 1998, he 
returned from retirement to chair the Uni- 
versity of Chicago's biochemistry depart- 
ment, and continued his studies of EPO 
from 1998 to 2002, eventually retiring again 
at the age of 80. His unassuming nature, 
critical mind, collegiality and dedication — 
including to the Chicago Blackhawks ice- 
hockey team — will be greatly missed. m 


Don Wojchowski is director of the 
COBRE Center of Excellence in Stem Cell 
Biology and Regenerative Medicine at the 
Maine Medical Center Research Institute, 
Scarborough, Maine 04074, USA. 

e-mail: wojchd@mmc.org 
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FORUM Drug discovery 


A question of library design 


Two approaches have emerged for creating libraries of compounds for use in biological screening assays for drug discovery — 
fragment -based ligand design and diversity-oriented synthesis. Advocates of each approach discuss their favoured strategy. 


THE TOPIC IN BRIEF 

@ There is an urgent need to improve 

the libraries of compounds used for drug 
discovery, to find better leads for medicinal 
chemistry programmes. 

@ Fragment-based ligand design involves 
screening small molecules that aren’t 
intrinsically drug-like, but that might 
become subunits (fragments) of drug-like 
compounds. 

@ Diversity-oriented synthesis aims to 


Small molecules, 
great potential 


PHILIP J. HAJDUK 


oth fragment-based screening (FBS) and 

diversity-oriented synthesis (DOS) grew 
out of the recognition 15 years ago that the 
libraries of compounds available for high- 
throughput screening (Fig. 1) were inadequate 
for many lead-discovery campaigns. Asa result, 
almost every large pharmaceutical company 
undertook library enrichment exercises, all of 
which incorporated some aspect of DOS’. The 
challenge has always been to balance the size 
and structural diversity of compound collec- 
tions against the cost associated with screening 
the compounds, while addressing the needs — 
binding affinity, selectivity and so on — of the 
anticipated portfolio of biological targets. FBS 
and DOS represent two extreme views on how 
to address these issues. I believe that FBS is the 
better strategy. 

Proponents of DOS advocate the produc- 
tion of numerous sets of compounds that have 
molecular structures not represented in exist- 
ing libraries, to continuously fill the gaps in 
‘chemical-diversity space. But this approach 
potentially yields millions of compounds 
that must all be screened at the start of drug- 
discovery programmes. Proponents of FBS 
believe that tremendous (and probably suffi- 
cient) chemical diversity can be represented ina 
library of several thousand ‘fragments’ (Fig. 2a). 
Indeed, fragment collections of as few as 1,000 


make many structurally varied, drug-like 
compounds for screening, using modular 
syntheses that involve few steps. 

@ The lead compounds identified from 
diversity-oriented synthesis generally 
differ markedly from those obtained from 
fragment libraries. 

@ The pros and cons of the two approaches, 
and the chances of success of subsequent 
drug programmes, are a matter of vigorous 
debate. 


molecules can arguably represent the chemical 
diversity contained in tens of millions of larger, 
more drug-like compounds’. Thus, FBS librar- 
ies achieve greater chemical diversity than even 
the largest available compound libraries, and 
can be screened far more cost-effectively. 

Most DOS libraries are prepared in a purely 
speculative manner, in the sense that it is not 
known if members of the libraries will be active 
against any relevant biological target. A signifi- 
cant up-front investment in compound syn- 
thesis must therefore be made that may never 
pay off. By contrast, compounds based on frag- 
ment leads are always directed towards, and 
dictated by, the target under study. This allows 
chemists to dedicate their efforts primarily to 
current drug-discovery targets, rather than 
diverting resources to the potentially wasteful 
production of compounds that have unknown 
biological activities. 

Finally, there is ample evidence that larger 
molecules are less likely than smaller ones 
to succeed as drugs in clinical trials’, mainly 
because their physico-chemical properties are 
not drug-like. It is therefore vital to identify 
drug candidates that not only are potently active 
at a biological target, but also have acceptable 
physico-chemical properties. Because frag- 
ment-based design involves the tailored con- 
struction of drugs from compounds that are 
soluble and of low molecular mass, this strat- 
egy offers the greatest potential for discover- 
ing the smallest possible compounds that bind 
most efficiently to a particular target — that is, 
compounds in which all the structural features 
contribute to binding. 

By contrast, many of the properties of 
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Figure 1 | Stack’em up. Part of the compound 
library at the Sanofi-aventis laboratory in Toulouse, 
France. More than 1 million compounds, stored in 
trays of vials, are kept here. 


compounds in DOS libraries are not drug- 
like. So, even if these compounds appear as 
hits — active compounds — ina screen, many 
analogues may have to be made to find one that 
is not only active at a biological target, but also 
‘druggable’ Indeed, most compounds in DOS 
libraries would be excluded from many corpo- 
rate screening collections because of their poor 
physico-chemical properties. 

In summary, fragment-based drug design 
offers several advantages over DOS: fragment 
libraries are more diverse, synthetic resources 
are used more efficiently and the leads identi- 
fied from FBS are more likely to yield drug 
candidates that have optimal physico-chemical 
properties. Indeed, several compounds derived 
from fragment-based drug design are already in 
clinical trials’, providing substantial justifica- 
tion for further investment in this strategy by 
the pharmaceutical industry. 


Philip J. Hajduk is in the Department of Lead 
Discovery, Abbott Laboratories, Abbott Park, 
Illinois 60064, USA. 

e-mail: philip.hajduk@abbott.com 
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Better leads come 
from diversity 


WARREN R. J. D. GALLOWAY & 
DAVID R. SPRING 


he advantage of the DOS approach is 

that it efficiently creates structurally 
diverse molecules whose molecular masses 
are usually close to those of drug-like com- 
pounds”® (Fig. 2b). Moreover, DOS pro- 
vides access to molecules that have thus far 
escaped the attention of humans and per- 
haps even nature’. We believe that screening 
such molecules is the best general approach 
to finding a new lead compound for drug 
discovery. Let us illustrate why. 

Put yourself in the shoes of a scientist who 
wants to develop a new drug to treat a disease 
such as cancer, but who does not know the 
precise nature of the relevant disease-causing 
target. A direct way to address this problem is 
to screen a library of molecules to see whether 
any of them kill cancer cells selectively. To 
maximize success, you need a collection of 
structurally diverse, drug-like molecules, 
such as those produced from DOS. A frag- 
ment library would be completely inappropri- 
ate, because molecules this small do not bind 
to drug targets with sufficient potency and 
specificity to be identified in such screens. 

The use of a fragment library is viable, 
however, when you know exactly what the pro- 
tein target is. To be fair, this is often the case for 
pharmaceutical companies today. But because 
fragment-based drug discovery requires 
knowledge of the way in which substrates bind 
to targets, this approach works only if you have 
a water-soluble protein for which much struc- 
tural information is available, and for which 
the molecular binding modes of substrates 
are easily obtainable using methods such as 
nuclear magnetic resonance (NMR) spectros- 
copy or X-ray crystallography*”*. Unless this 
is the case, then screening structurally diverse 
drug-like molecules (such as those obtained 
using DOS) in biological assays is the way 
forward. 

Even if structural information abouta target 
is easy to obtain, FBS won't necessarily pro- 
vide any hits. We have experienced this when 
looking for modulators of protein-protein 
interactions (PPIs). Indeed, it can be argued 
that non-traditional pharmaceutical tar- 
gets — those, such as PPIs, that don’t involve 
enzymes or receptors — are unlikely to be 
suitable for fragment-based drug discovery. 
This is because the binding sites that are trac- 
table to pharmaceutical modulation in these 
systems are typically more highly exposed to 
water than are enzyme active sites or receptors; 
fragments tend to bind poorly to such sites. Yet 
such ‘undruggable’ targets are currently the 
most exciting for drug discovery. Screening 
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Figure 2 | Generating lead compounds for drug discovery. a, In fragment-based screening, libraries 
of structurally diverse, small molecules that could become fragments of active drugs are screened in 
assays for a biological target. The lead compounds identified in this way have low affinities for the target, 
but subsequent rounds of optimization — in which the structure of a lead is systematically altered and 
enlarged — generate high-affinity, drug-like compounds for clinical trials. Differently coloured circles 
represent different chemical groups. b, Using diversity-oriented synthesis, libraries of structurally diverse, 
drug-like compounds are made as efficiently as possible, typically from common intermediates. The 
libraries are then screened and the resulting lead compounds — which typically have higher affinities for 
targets than do fragments — are optimized to produce candidates for clinical trials. The drug candidates 
produced using the two approaches tend to differ from each other in many respects. Libraries produced 
by either method typically contain hundreds or thousands of compounds. 


of DOS libraries has provided hits for several 
non-traditional targets, including PPIs’. 

Of course, DOS has its own inherent 
challenges — fragment libraries can, in prin- 
ciple, cover more chemical space with fewer 
compounds of a given molecular size than 
can DOS libraries, for example. But, in prac- 
tice, fragment libraries often have limited 
structural diversity and tend to be biased 
either towards compounds that satisfy the 
dogma of traditional medicinal chemistry or 
towards aromatic compounds (those contain- 
ing benzene rings or related ring structures), 
which are easily detected by NMR screening. 
Experience also shows that the optimization 
of leads from FBS is likely to generate flat mol- 
ecules as drug candidates. Nature, however, is 
three-dimensional, and so drugs are likely to 
be more selective for their targets if they too 
are three-dimensional. An advantage of DOS 
is that it typically comes up with new, three- 
dimensional molecular scaffolds. 

To be clear, we do acknowledge that FBS has 
led to the discovery of drug-like compounds 


in certain optimal cases. But we believe that 
better compounds can be found using DOS, an 
approach that is applicable for drug discovery 
in general. m 


Warren R. J. D. Galloway and David R. 
Spring are in the Department of Chemistry, 
University of Cambridge, Cambridge 

CB2 1EW, UK. 

e-mail: spring@ch.cam.ac.uk 
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PLANT BIOLOGY 


Defence at dawn 


Aremarkable example has been discovered of a plant tuning its immune defence 
against a pathogen. The tuning consists of maximal expression of the relevant 
genes at the time of day when attack is most likely. SEE LETTER P.110 


C. ROBERTSON MCCLUNG 


so they have evolved mechanisms 

to resist attack. Those mechanisms 
include forms of immunity, and on page 110 of 
this issue Wang et al.’ reveal an unexpected 
connection between plant immunity and the 
circadian clock. 

Plant immunity is controlled by a complex 
and incompletely understood signalling net- 
work. The first level of defence 
consists of the recognition of mol- 
ecules called pathogen-associated 
molecular patterns (PAMPs) and 
the subsequent initiation of PAMP- 
triggered immunity’. Because 
many pathogens can suppress 
this response, often by means of 
proteins generically termed effec- 
tors, plants have evolved disease- 
resistance (R) genes that recognize 
effectors or the action of effectors 
on their host-cell targets. The 
consequent effector-triggered 
immunity involves the repro- 
gramming of gene transcription, 
which culminates in physiological 
changes that include the hypersen- 
sitive (cell death) response’. 

Plant pathogens can be divided 
into necrotrophs, which kill the 
host and feed on the dead tissues, 
and biotrophs, which infect and 
feed offa living host without killing it. R-gene- 
mediated defence is the primary response to 
biotrophs, including the oomycete Hyaloper- 
onospora arabidopsidis, which causes downy 
mildew disease in the thale cress Arabidopsis 
(Fig. 1). In their paper, Wang et al.’ describe 
novel components of R-gene-mediated resist- 
ance against downy mildew in Arabidopsis, and 
show that these components are controlled by 
a transcription factor called CIRCADIAN 
CLOCK-ASSOCIATED 1 (CCA1), which is an 
essential component of the circadian clock. 

Wang et al.' employed a systems approach. 
They compared changes in Arabidopsis gene 
expression in response to H. arabidopsidis 
infection over time in wild-type seedlings 
(which successfully resist this pathogen owing 
to the presence of the RPP4 disease-resistance 
gene) with gene-expression changes in an rpp4 
mutant (which has lost RPP4 function and is 
sensitive to infection). Their examination of 


P lants cannot avoid pathogens by fleeing, 


events shortly after infection revealed several 
genes that responded to H. arabidopsidis chal- 
lenge in an RPP4-dependent manner but that 
had not previously been identified as immune 
regulators or as components of the response to 
R-gene activation. 

The authors then tested mutants in which 
each of these novel plant genes was non-func- 
tional, to define a subset that is important for 
successful disease resistance. These genes could 
be divided into two clusters — one associated 


Figure 1 | Hyaloperonospora arabidopsidis in spore-production mode. The 
spores (dark particles) are borne on a structure known as the sporangiophore, 
which here is about 0.7 millimetres in height. 


mainly with the hypersensitive response and 
the other with physico-chemical responses to 
infection. Intriguingly, the promoter sequences 
of both groups of genes showed over-repre- 
sentation of binding sites for CCA1, impli- 
cating the circadian clock in the regulation of 
immunity. 

The circadian clock is a timekeeping mecha- 
nism that enables the anticipation of environ- 
mental conditions or biological events that 
occur at predictable times of day’. For example, 
cold stress typically occurs at night, and the 
plant circadian clock modulates cold responses 
accordingly*. Similarly, H. arabidopsidis sporu- 
lates at night and disseminates its spores at 
dawn, perhaps because enhanced humidity 
optimizes germination and the colonization 
of new hosts’. 

Wang et al.' found that some Arabidop- 
sis mutants in which clock function was 
impaired showed increased susceptibility to 
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H. arabidopsidis infection. They propose that, 
in wild-type plants, disease resistance is maxi- 
mal at dawn, when the likelihood of encoun- 
tering H. arabidopsidis spores is greatest, and 
reduced at dusk. They suggest that the plant cir- 
cadian clock mediates a pulse of defence-gene 
expression near dawn, when CCA1 expression 
is high, in anticipation of possible H. arabidop- 
sidis challenge at that time. This anticipatory 
expression of defence genes, at a time when 
the potential for pathogen challenge is perhaps 
higher than during the rest of the day, could 
allow the need for maximal disease resistance 
to be balanced against the growth decrement 
associated with sustained expression of defence 
genes’, The central role of CCA 1 is emphasized 
by the reduced resistance to H. arabidopsidis at 
dawn in cca1 loss-of-function mutants; these 
same mutants showed no reduction in resist- 
ance, compared with wild-type plants, in 
response to H. arabidopsidis challenge at dusk, 
when CCA1 is not expressed. 

It is intriguing that CCA1 
serves as an integrator between 
the immune response and the 
circadian clock to elicit enhanced 
resistance, whereas a closely 
related transcription factor, LHY, 
does not — /hy mutants show no 
decrement in disease resistance, 
yet display a similar shortening 
of circadian period to that seen in 
ccal mutants’. That may indicate 
that the primary defect in disease 
resistance in ccal mutants is due 
to the action of CCA1 on an out- 
put pathway that is modulated by 
the circadian clock, rather than on 
the oscillator mechanism itself. It 
is also intriguing that CCA1, but 
again not LHY, has a crucial role 
in the integration of nitrogen 
assimilation and the circadian 
clock’. 

The work of Wang et al.' has 
implications for understanding other plant 
pathogens. One example is another oomycete, 
Phytophthora infestans, which causes late blight 
in potatoes and was responsible for the potato 
famines of the 1840s*. A second example, of 
more recent notoriety, is the biotrophic Ug99 
strain of wheat stem rust (Puccinia graminis, 
a basidiomycetous fungus), which was first 
identified in Uganda in 1999 but now threat- 
ens wheat-growing regions in Asia’. Ug99 
overcomes the resistance afforded by most of 
the R genes in current cultivars. New insights 
into R-gene-mediated disease resistance may 
contribute to the development of wheat that is 
resistant to this devastating pathogen. 

The results’ may also apply beyond the plant 
world. For instance, many genes involved in 
innate immunity in fruitflies exhibit cyclic 
expression controlled by the central oscillator 
component CLOCK"”*. Wild-type flies show 
circadian variation in resistance to bacterial 
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infection, peaking in the middle of the night". 
Similarly, evidence is accumulating in support 
of circadian modulation of human immune 
responses”, 

Finally, returning to H. arabidopsidis, it is 
not known whether the apparently rhythmic 
sporulation’ is a direct response to the light- 
dark cycle or instead represents a circadian 
rhythm driven by an internal clock. Nor is 
it known whether rhythmic sporulation is a 
characteristic of oomycetes in general. Rhyth- 
mic asexual spore production (conidiation) in 
the ascomycetous fungus Neurospora crassa 
has provided one of the most fruitful systems 
for investigating circadian rhythms”, and evi- 
dence from genome sequencing of many fungal 
species indicates that some clock proteins were 
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present in the common ancestor of the fungi". 
Oomycetes are only distantly related to fungi’, 
but the identification of a circadian component 
in plant resistance to oomycete infection justi- 
fies investigation of potential circadian rhyth- 
micity in oomycetous and fungal pathogens 
as well’®. m 
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Big black hole found 
in tiny galaxy 


Conventional wisdom tells us that supermassive black holes are found exclusively 
in massive galaxies undergoing little star formation. But one such object has 
now been discovered in a star-forming dwarf galaxy. SEE LETTER P.66 


JENNY E. GREENE 


n page 66 of this issue, Reines and 

collaborators’ report a most unlikely 

discovery: a big black hole in the centre 
ofa very small galaxy. The low-mass (or ‘dwarf’) 
galaxy Henize 2-10, the focus of their research, 
is currently undergoing a major growth spurt. 
It seems to have formed an appreciable fraction 
of its stars in the past 10 million years’, which is 
unusually fast for present-day galaxies. Now it 
appears that Henize 2-10 is growing more than 
just stars. The authors detect a source of light 
that they attribute to gas falling into a super- 
massive black hole at the heart of the galaxy. 
This is the first potential detection of a black 
hole ina rapidly star-forming dwarf galaxy such 
as Henize 2-10. If confirmed, this observation 
gives us the hitherto impossible opportunity to 
study at first hand the growth of black holes in 
a forming galaxy. 

Over the past three decades, astronomers 
have used Henize 2-10 and galaxies like it as 
laboratories for what might have happened in 
the first days of galaxy formation’. They can 
study how gas is converted into stars, and how 
and where those stars are formed, as the galaxy 
grows. It turns out that most young stars form 
in compact clusters within the growing galaxy. 

Despite many years of study, however, no 
central supermassive black hole had been dis- 
covered in Henize 2-10. Light from the young, 
newly forming stars had overwhelmed the 


unique light signatures of the black hole and 
masked its presence. But using new and archi- 
val high-resolution data, Reines et al.’ found 
a source of both radio and X-ray emission far 
from any site of new star formation. Although 
massive star birth and death is accompanied by 
sources of radio and X-ray emission, the rela- 
tive strengths of the radio and X-ray sources, 
combined with their distance from newly 
forming stars, strongly suggests a more exotic 
origin — a radiating black hole. 

Reines and collaborators were aided by 
better data, but ultimately it was good intuition 
and imagination that led to their discovery. 
Conventional wisdom tells us that supermassive 
black holes with masses millions to billions 
times that of our Sun are found exclusively in 
massive galaxies, not in dwarf galaxies such as 
Henize 2-10. And usually, supermassive black 
holes are found in the parts of galaxies with 
regular elliptical shape and very little ongoing 
star formation, not in lumpy star-forming 
galaxies. The new discovery, while unexpec- 
ted, may represent an important opportunity 
to investigate the unknown origins of super- 
massive black holes. We do not yet have 
telescopes powerful enough to witness the 
interactions between the first black holes and 
their parent galaxies. Because Henize 2-10 is 
currently forming a large fraction of its stars, 
much like the host galaxies of the first black 
holes, we may finally have our chance to observe 
a growing black hole within a forming galaxy. 
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The case is not yet watertight for a black hole 
in the centre of Henize 2-10. We cannot rule 
out the chance alignment of two less exotic 
objects associated with star birth and death in 
such a complicated and dusty region. Further- 
more, some expected signatures ofa black hole, 
such as emission from highly excited neon 
atoms, are not observed. But without know- 
ing more precisely the mass and total power of 
the putative black hole, it is hard to guess how 
bright the emission should be. Reines et al. 
estimate a range of black-hole masses between 
100,000 and 10 million times the mass of the 
Sun, which places it at the low-mass end of 
supermassive black holes. Accurate estimates 
of the black-hole mass must await the next gen- 
eration of optical telescopes. In the meantime, 
Reines and collaborators are obtaining a more 
precise measure of the total power output of 
the black hole. 

Although the discovery of a massive black 
hole in a dwarf star-forming galaxy is unex- 
pected, there are many known examples of 
black holes in small galaxies. Indeed, recent 
work** has uncovered many examples of small 
galaxies containing black holes with masses 
approximately 10,000 times that of our Sun. 
Unfortunately, obtaining an accurate count of 
the black-hole population becomes increas- 
ingly difficult for low-mass black holes in small 
galaxies. Emission from these black holes is 
always faint, and because they are so small we 
cannot directly observe their gravitational pull 
on surrounding stars. 

Reines and collaborators’ have identified 
an entirely new parent population of galaxies 
where many unknown supermassive black 
holes could be lurking. There are many things 
Henize 2-10 may teach us. First, black-hole 
growth might inhibit star formation in fragile, 
first-generation galaxies’. To test this hypothesis, 
we can look for signs that energy from the black 
hole is heating or removing gas in Henize 2-10. 
Second, the location of the putative black hole 
away from any star cluster is intriguing, because 
most proposed mechanisms for making new 
black holes involve massive stars’. And perhaps 
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most importantly, we will now start looking for 
more black holes in dwarf star-forming galaxies. 
Henize 2-10 reminds us to keep our eyes open 
and expect the unexpected. m= 
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DNA fragility put 


into context 


Fragile sites are genomic regions prone to deletions or other alterations during 
DNA replication. The reason for the susceptibility of these sites to damage may be 
simple: they contain few replication initiation sites. SEE LETTER P.120 


KAY HUEBNER 


gents that hinder DNA replication can 
Aw cause breaks at specific chromo- 
some regions commonly called fragile 
sites. Two questions have been hotly debated 
concerning these chromosome regions: why 
are they so sensitive to DNA damage; and 
how does damage to genes encompassing 
such sites affect the biology of the damaged 
cells? A paper by Letessier et al.’ on page 120 
of this issue presents a rather simple, yet ele- 
gant, answer to the first of these questions. 
The authors also report that, surprisingly, the 
sensitivity of specific fragile sites within a cell 
depends on the tissue or organ from which the 
cell originates — a conclusion that may provide 
clues to the answer to the second question. 
During DNA replication, the enzyme heli- 
case breaks hydrogen bonds that hold the two 
DNA strands together, uncoiling the DNA 
spiral to form a structure called the replication 
fork. In humans, mice and other mammals, the 
propensity of common fragile sites to deletions 
and other rearrangements has been attributed to 
various features of the specific DNA sequences 
at fragile regions, which can span more than 
1 million base pairs. For instance, DNA 
sequences at fragile sites might be prone to 
forming secondary structures that impair the 
movement of the replication fork, leading to 
its collapse and to DNA breaks’; fragile regions 
might replicate late in the DNA-replicative 
phase — even later in the presence of DNA- 
damaging agents — and so their replication 
remains incomplete’; or differences in the com- 
position of DNA-associated proteins between 
fragile and non-fragile regions’ may account for 
the susceptibility of the former to breakage. 
Letessier and colleagues' point toa different 
reason. The authors focus on FRA3B — the 


Figure 1 | Damage-prone DNA. An image of 
normal chromosomes in a human lymphoblast. 
The FRA3B region (tagged by a green DNA probe 
flanking the gaps on both chromosomes 3) is a 
fragile site prone to breakage. Letessier et al.' show 
that one reason for the fragility of this genomic 
region is a paucity of initiation sites for DNA 
replication. (Image adapted from ref. 10.) 


most active fragile site in a type of human white 
blood cell called a lymphoblast and which lies 
within a tumour-suppressor gene known as 
FHIT (Fig. 1). They find that FRA3B is a fragile 
site not because the replication fork is slowed 
or stalled at this genomic region (locus), but 
because of the scarcity of replication initiation 
events there. 

Indeed, the study shows that in lympho- 
blasts — but not in fibroblasts, which originate 
from connective tissue — initiation events are 
entirely absent from the central fragile region 
of FRA3B. Consequently, the replication of this 
large region within FHIT can be completed 
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only by convergence of replication forks from 
the flanking regions (see Fig. 3a of the paper 
on page 122). The authors propose, therefore, 
that common chromosome fragile sites corre- 
spond to the initiation-poor regions that finish 
replication last in the given cell type. 

To visualize DNA-replication dynamics in 
the fragile regions, Letessier et al. analysed 
stretched-out DNA strands, and searched 
for newly synthesized, fluorescently tagged 
FRA3B regions. They could thus examine 
both replication-fork speed and fork slow- 
ing, as well as mapping replication start and 
stop sites within FRA3B in lymphoblasts. The 
authors report that, within this fragile site, 
forks did not stall. Instead, the 700-kilobase 
core region of FRA3B, which contains the 
protein-coding portion of FHIT — the region 
most often mutated by deletion in cancers and 
precancerous lesions — showed no initiation 
events. By comparison, roughly ten initiation 
events occurred in similarly sized, non-fragile 
regions. These results suggest that a paucity of 
initiation events contributes to fragility. 

A previous genome-wide analysis of 
replication timing’ in lymphoblasts and fibro- 
blasts revealed that, in these two cell types, the 
dynamics of replication (its timing and initia- 
tion patterns) are very different. For example, 
whereas initiation events occurred within the 
fragile sites in fibroblasts, they were consist- 
ently scarce in lymphoblasts. Do such differ- 
ences in initiation of replication between the 
two cell types — at least when grown in cul- 
ture — mean that the fragility of FRA3B also 
differs between them? According to Letessier 
and co-workers, the answer is indeed yes. 

The authors find that fibroblasts do not 
exhibit the breaks at FRA3B that are charac- 
teristically seen in lymphoblasts. They also 
report similar findings for the second most 
active fragile region known in lymphocytes, 
FRA 16D. Together, these data indicate that 
common chromosome fragile regions are 
“loci that correspond to the latest initiation- 
poor regions to complete replication in a given 
cell type”’. How such differential replication 
programs in cells of different tissue origin are 
established is unknown, but it cannot depend 
on DNA sequence because the DNA sequence 
of the FRA3B locus is the same in fibroblasts 
and lymphoblasts; likewise, the sequence of 
FRA16D is the same in the two cell types. 

Common chromosomal fragile sites have 
previously been assessed in lymphoblasts, 


fibroblasts, some cancer cells of epithelial 
origin®’ and even in cells of different lineages 
in a single study®. What has been lacking, 
however, is a systematic comparison of the 
frequency of breaks at known fragile sites in 
different cell types from a range of organs and 
tissues. Yet it is known that FHIT and WWOX 
— the two genes at the two most fragile lociin 
lymphoblasts — are also among the sequences 
most frequently altered by DNA deletion in 
precancerous and cancerous cells of epithelial 
origin (that is, cells from internal organs, not 
lymphoblasts); this is presumably due to expo- 
sure of cells within lung, colon and breast to 
agents that cause replicative stress”. 

It has been argued that damage to genes at 
fragile sites, and the consequent loss of the 
genes’ expression, contributes to the selective 
growth of precancerous lesions and cancerous 
tumours. A counter- argument is that, because 
of the frequent deletions within the fragile 
sites, the loss of any associated gene expression 
is an unselected ‘passenger’ event in cancers” 
and does not drive the expansive growth of the 
cancer cells. 

A comparison of the frequency of breaks 
at known fragile sites in a range of cell types, 
including the epithelial cells of the lung, colon, 
breast and prostate — where most human can- 
cers originate — could shed light on several 
questions arising from the current study’. For 
instance, are FRA3B and FRA16D the most 
fragile regions in these cells, as they are in lym- 
phoblasts? If fibroblasts show infrequent DNA 


THEORETICAL ECOLOGY 


NEWS & VIEWS | RESEARCH | 


damage at these sites, do they contain a differ- 
ent set of fragile sites, or do they show a low 
frequency of breaks across all chromosomes? 
Do the frequency and sites of breaks at fragile 
regions in cultured cells correspond to those 
in the same cell type within its organ of origin? 
Finally, if FRA3B is not the most fragile site in 
epithelial cells, does that strengthen the argu- 
ment that breaks in FHIT in precancerous cells 
and cancers have contributed to progressive 
growth of the lesions? Undoubtedly, Letessier 
and colleagues, and indeed other scientists, 
will be searching for the answers. = 
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Waltz of the weevil 


The aquatic plant Salvinia molesta is a widespread pest of waterways in the 
tropics and subtropics. A study of its control by a weevil in Australian billabongs 
sets a new standard in ecological time-series analysis. SEE LETTER P.86 


LEWI STONE 


altzing Matilda, the bush ballad 
Wis became Australia’s unofficial 

national anthem, relates the exploits 
of an out-of-control sheep shearer “camped 
by a billabong” (a small stagnant lake) in the 
Australian outback. On page 86 of this issue we 
read about the findings ofan inspired group of 
theoretical ecologists’ and their models of out- 
of-control billabongs. Schooler et al.' present 
a mathematical modelling study of an inva- 
sive plant species, the noxious weed Salvinia 
molesta, and its erratic large-scale outbreaks 
in four billabongs over a period of several 
decades. The authors’ skilfully executed 
modelling is an imaginative combination of 
nonlinear dynamics, statistical inference and 
stochastic time-series analysis. 


The story of Salvinia has become a classic in 
the biological invasion literature’. This aquatic 
plant from South America is notorious for its 
rapid, almost uncontrollable growth, and since 
1939 has become a pest in regions far from 
its home range. The weed is able to double 
its biomass every 3-4 days, generating thick 
mats of plant material that often cover entire 
water bodies (Fig. 1). Salvinia infestations can 
have devastating effects on lakes, billabongs 
and waterways, rendering them unusable for 
irrigation, as sources of drinking water or for 
sustaining local fish populations. In the past, 
entire villages in Papua New Guinea have been 
abandoned and the inhabitants relocated as a 
consequence of this out-of-control weed. 

Salvinia first appeared in Australia’s Kakadu 
National Park in 1983 and, within months, 
biological control was implemented by the 
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50 Years Ago 


The Committee on Scientific 
Research in Schools which was 
established by the Council of the 
Royal Society in 1957 has issued 

a report covering the period 
November 1, 1959-October 31, 
1960. It is stated that interest shown 
by schools in undertaking research 
has continued to increase, and the 
Committee is now administering 
research projects in 56 schools. 
Some of these schools have more 
than one research project under 
way and a total of 66 separate 
research projects are now being 
actively pursued compared with 
57 last year ... All research projects 
with which the Committee is 
dealing are carried out with the 
specialist advice and assistance 

of Fellows of the Society and 

by others who act as advisers. 

The Committee invites further 
requests from school-masters and 
school-mistresses for assistance in 
undertaking research. 

From Nature 4 February 1961 


100 Years Ago 


Many attempts have been made 

to synchronise the phonograph 

or gramophone with the 
kinematograph, so as to be able 

to reproduce simultaneously the 
sounds of the voice, as in singing 
and speech, while the movements 
of the face and the bodily gestures 
of the singer or speaker are depicted 
on the screen ... The difficulties, 
however, appear to have been 
surmounted by M. Gaumont... 

The details of the method are not 
fully developed, but they are to be 
made public without delay ... We 
may soon have in our homes the 
chefs-doeuvre of our theatres played 
by our best actors, and even lectures 
by famous professors may not be 
restricted to their class-rooms ... 
Such reproductions are to be called 
phonoscenes. 

From Nature 2 February 1911 


“a 


Figure 1 | Salvinia infestation of the Howard River, near Darwin, Australia. a, The scene before the 
advent of Salvinia in July 1984. b, One month later, with Salvinia rampant. c, February 1985, and the 
infestation is dying after attack by the weevil Cyrtobagous salviniae. Photographs taken by Colin Wilson 


(at risk of crocodile attack). 


introduction of the weevil Cyrtobagous salviniae. 
Unusually, this beetle can feed on Salvinia 
possibly as fast as the weed can grow, causing 
the latter’s dense mats to turn brown through 
decomposition (Fig. 1). In some cases the weevil 
can remove 99% of a large-scale Salvinia out- 
break — which can comprise tens of thousands 
of tonnes of biomass — within a year. 

Schooler et al.’ elaborate on how the annual 
flooding of the billabongs further complicates 
the outbreak dynamics. Flooding tends to 
flush Salvinia downstream, allowing the pest 
to invade other billabongs or to find refuge in 
terrestrial sites. The ability of Salvinia to escape 
during flooding makes it almost impossible to 
eradicate the weed, despite the high efficiency 
of the biological control. Hence outbreaks 
recur, appearing at erratic and unpredictable 
intervals as they manage to evade the weevil’s 
stranglehold. 

The jumps from periods of control to peri- 
ods of outbreaks have allowed Schooler et al.' to 
draw upon the powerful theoretical framework 
of alternative stable states, which has proved 
particularly relevant to those ecosystems in 
which abrupt changes and catastrophic shifts 
are intrinsic features” *. The authors were able 
to formulate the basic structure ofa mathemati- 
cal model that suits the Salvinia—weevil system. 
The nonlinear model has a deterministic 
‘skeleton’ that is driven by environmental sto- 
chasticity, and takes into account the observed 
time-series measurement errors in Salvinia 
and weevil population abundance. But match- 
ing the model to the time series proved to be 
a formidable challenge. The high stochasticity 
and unpredictability of the billabong system, 
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combined with the complex dynamics intro- 
duced by periodic flooding, mean that, using 
most standard techniques, it is extremely diffi- 
cult to interpret which of the different states the 
system is moving towards at any point in time. 

Modern time-series analysis came to the 
rescue. Theoretical ecologists are familiar with 
the contributions of one of the authors (Ives) 
to ecological time-series analysis*”; Ives and 
his Australian co-authors now present! fur- 
ther innovation. More specifically, they make 
use of the (extended) Kalman filter, a statisti- 
cal technique for which its inventor, engineer 
Rudolf Kalman, received the US National 
Medal of Science in 2008. The filter smooths 
out the system’s stochasticity and, in parallel, 
provides an estimate of the model’s statistical 
likelihood. 

With such an estimate, it became possible to 
fine-tune the model structure by comparing a 
suite of different possibilities, while fitting the 
models to the observed billabong data. This 
allowed determination of the most reason- 
able model, and homed in on the best-fitting 
parameter estimates in a statistically rigor- 
ous manner. These methods are now finding 
exciting biological applications'®, but in prac- 
tice their complexity would normally call for 
the participation ofa versatile mathematician. 
Hence the importance of multidisciplinary 
cooperation. 

With the final model in hand, Schooler et al.' 
initiated a theoretical study of the nonlinear 
dynamics of its deterministic skeleton and 
investigated the existence of possible alterna- 
tive stable states, and the manner in which they 
depend on control parameters. Two states were 


identified: a low, Salvinia-free state, and a high 
state of dense Salvinia biomass. Under some 
conditions, these two stable states can coexist 
and population trajectories may be attracted 
to either state. Under other conditions there 
might be only a single state that is attractive. 
With flooding events and stochastic forcing, 
the system may be bouncing in a complex way 
between states, making it almost impossible to 
ascertain the underlying rules or patterns just 
by looking at the Salvinia or weevil time series. 

The model that emerges from Schooler and 
colleagues’ analysis' provides a useful tool for 
understanding the driving forces behind the 
Salvinia-weevil system — and its alternative 
stable states — that would otherwise be dif- 
ficult to identify. With that as background, 
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the authors discuss how the modelling frame- 
work helps to suggest practical solutions for 
biological control. In particular, they argue that 
it may be possible to take advantage of the sys- 
tem’s stochastic fluctuations and its associated 
erratic jumping between alternative states. In 
the higher state, when Salvinia biomass is at 
high density, weevil control is least effective. 
However, augmenting weevil control at those 
times when the system is attracted to its lower 
state might possibly trap the system into a sta- 
ble Salvinia-free state. The model could thus 
help managers to identify the optimal time to 
apply biological control. 

In all, Schooler and colleagues’ careful 
attention to data, and their development and 
implementation of modelling techniques, 


Methane and monsoons 


The rising trend in atmospheric concentrations of methane over the past 
5,000 years has been attributed to human agency. A modelling study, of a power 
that has only now become possible, points to another cause. SEE LETTER P.82 


ERIC W. WOLFF 


ethane is a potent greenhouse gas, 
Me influences the levels of other 
atmospheric constituents. The huge 

increase, of about 150%, to nearly 1,800 parts 
per billion by volume (p.p.b.v.) in atmospheric 
concentration over the past two centuries’ is 
clearly caused by human activities. 
However, methane concentration 
also increased significantly, from 
about 550 to 700 p.p.b.v., over the 
previous 5,000 years — the later part 
of the (interglacial) Holocene epoch 
that began some 10,000 years ago. 
There has been intense debate about 
whether this rise was also anthropo- 
genic or was due to changes in natural 
sources and sinks. Using models of 
climate, vegetation and emissions, 
Singarayer et al.” show how the 
increase could have arisen from 
natural causes (page 82 of this issue). 
Detailed ice-core data for methane 
now cover the past 800,000 years’. 
They show a characteristic pattern 
over glacial-interglacial cycles, with 
higher values during interglacials. 
From the combined use of methane- 
concentration and isotopic data, 
it seems that the main cause of the 
glacial—interglacial rise was almost 
certainly an increase in the strength 
of wetland methane sources’, per- 
haps allied to a weakening of the 
atmospheric sink. Rapid fluctuations, 


Methane (p.p.b.v.) 


simultaneous with fast, millennial-scale 
climatic changes in the Northern Hemisphere, 
are also seen. Finally, the ‘envelope’ of data 
seems to follow closely the pattern of pre- 
cession in Earth's orbit, which has a roughly 
20,000-year cycle. The apparent reason for this 
is that tropical wetland emissions of methane 
respond to the amount of incoming solar 
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set a new standard in ecological time-series 
analysis. Their approach promises to have 
many applications in future studies of noisy 
biological data sets. = 
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radiation (insolation) in summer at northern 
low latitudes. Insolation reaches its maximum 
during the part of the precession cycle when 
the elliptical orbit of Earth takes the planet 
closest to the Sun during northern summer. 
The result is a stronger monsoon in Asia and 
other regions, with more summer precipita- 
tion, and consequently greater wetland areas 
and methane production by soil-dwelling 
microorganisms. 

However, the increase of the past 5,000 years 
departed from this pattern, with an increase 
in atmospheric methane concentration at a 
time when northern summer insolation was 
decreasing. In influential papers**®, Ruddiman 
proposed that the increase was due to human, 
especially agricultural, activity, which over- 
whelmed the variations in natural sources, 
even 5,000 years ago. There has been debate 

about whether the much smaller 
human population of that period 
could really have had such a domi- 
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nant effect. A strength of the hypoth- 
| esis has been that the pattern of the 

past few millennia differed from that 
of earlier interglacials, which more 
closely followed the precessional 
insolation pattern. 

Singarayer and colleagues’ 
approach’ involved an intensive 
modelling programme. To provide 
snapshots roughly every 2,000 years 
over the last glacial cycle, span- 
ning 130,000 years, they ‘forced’ the 
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Figure 1 | Atmospheric methane concentrations during the present 
and last interglacials. Almost all of the data come from ice cores”. 
The curves are for the past 10,000 years — the Holocene — and for 

the equivalent period (125,000-115,000 years ago, in terms of orbital 
precession) in the last interglacial. The red curve (with modern 
atmospheric data in blue) shows methane levels during the present 
interglacial, with a rise commencing 5,000 years ago. The green curve 
shows the contrasting continual decrease in methane during the last 
interglacial. Singarayer and colleagues’ modelling study” can explain the 
trends in both interglacials in terms of Earth's orbit, except for the past 
200 years, when a marked anthropogenic effect has occurred. 


) Hadley Centre's HadCM3 coupled 
ocean-atmosphere general cir- 
culation model (GCM) with the 
appropriate orbital and ice-sheet con- 
figurations, and with greenhouse-gas 
concentrations, and ran the model to 
equilibrium. Other simulations were 
carried out, in which one or more of 
these forcings was held constant to 
isolate the causes of change. 

The authors then fed the out- 
put of each climate simulation 
through a series of offline models 
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A closely packed system of low-mass, 
low-density planets transiting Kepler-11 


Jack J. Lissauer', Daniel C. Fabrycky’, Eric B. Ford®, William J. Borucki!, Francois Fressin‘*, Geoffrey W. Marcy”, Jerome A. Orosz°, 
Jason F. Rowe’, Guillermo Torres*, William F. Welsh®, Natalie M. Batalha®, Stephen T. Bryson', Lars A. Buchhave’, 
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Edward W. Dunham!!, Michael N. Fanelli!?, Jonathan J. Fortney’, Thomas N. Gautier III!?, John C. Geary’, Ronald L. Gilliland”, 
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Neil Miller”, Robert C. Morehead’, Elisa V. Quintana’, Darin Ragozzine’, Dimitar Sasselov*, Donald R. Short® & Jason H. Steffen'® 


When an extrasolar planet passes in front of (transits) its star, its radius can be measured from the decrease in starlight 
and its orbital period from the time between transits. Multiple planets transiting the same star reveal much more: period 
ratios determine stability and dynamics, mutual gravitational interactions reflect planet masses and orbital shapes, and 
the fraction of transiting planets observed as multiples has implications for the planarity of planetary systems. But few 
stars have more than one known transiting planet, and none has more than three. Here we report Kepler spacecraft 
observations of a single Sun-like star, which we call Kepler-11, that reveal six transiting planets, five with orbital periods 
between 10 and 47 days and a sixth planet with a longer period. The five inner planets are among the smallest for which 
mass and size have both been measured, and these measurements imply substantial envelopes of light gases. The degree 


of coplanarity and proximity of the planetary orbits imply energy dissipation near the end of planet formation. 


Kepler is a 0.95-m-aperture space telescope using transit photometry 
to determine the frequency and characteristics of planets and plan- 
etary systems’ *. The only fully validated multiple transiting planet 
system to appear in the literature to date is Kepler-9, with two giant 
planets’ orbiting exterior to a planet whose radius is only 1.6 times 
that of Earth®. The Kepler-10 system’ contains one confirmed planet 
and an additional unconfirmed planetary candidate. Light curves of 
five other Kepler target stars, each with two or three (unverified) 
candidate transiting planets, have also been published*. A catalogue 
of all candidate planets, including targets with multiple candidates, is 
presented in ref. 35. 

We describe below a six-planet system orbiting a star that we 
name Kepler-11. First, we discuss the spacecraft photometry on 
which the discovery is based. Second, we summarize the stellar prop- 
erties, primarily constrained using ground-based spectroscopy. Then 
we show that slight deviations of transit times from exact periodicity 
owing to mutual gravitational interactions confirm the planetary 
nature of the five inner candidates and provide mass estimates. 
Next, the outer planet candidate is validated by computing an upper 
bound on the probability that it could result from known classes of 
astrophysical false positives. We then assess the dynamical properties 
of the system, including long-term stability, eccentricities and rela- 
tive inclinations of the planets’ orbital planes. We conclude with a 
discussion of constraints on the compositions of the planets and 
the clues that the compositions of these planets and their orbital 
dynamics provide for the structure and formation of planetary 
systems. 


Kepler photometry 

The light curve of the target star Kepler-11 is shown in Fig. 1. After 
detrending, six sets of periodic dips of depth of roughly one milli- 
magnitude (0.1%) can be seen. When the curves are phased with these 
six periods, each set of dips (Fig. 2) is consistent with a model’ of a 
dark, circular disk masking out light from the same limb-darkened 
stellar disk; that is, evidence of multiple planets transiting the same 
star. We denote the planets in order of increasing distance from the 
star as Kepler-11b, Kepler-11c, Kepler-11d, Kepler-1le, Kepler-11f 
and Kepler-11g. 

Background eclipsing binary stars can mimic the signal of a transi- 
ting planet’®. Kepler returns data for each target as an array of pixels, 
enabling post-processing on the ground to determine the shift, if any, 
of the location of the target during the apparent transits. For all six 
planetary candidates of Kepler-11, these locations are coincident, with 
30 uncertainties of 0.7 arcseconds or less for the four largest planets 
and 1.4 arcseconds for the two smallest planets; see the first section of 
the Supplementary Information and Supplementary Table 1 for 
details. This lack of displacement during transit substantially restricts 
the parameter space available for background eclipsing binary star 
false positives. 

Supplementary Table 2 lists the measured transit depths and dura- 
tions for each of the planets. The durations of the drops in flux caused 
by three of the planets are consistent with near-central transits of the 
same star by planets on circular orbits. Kepler-11e’s transits are one- 
third shorter than expected, implying an inclination to the plane of the 
sky of 88.8° (orbital eccentricity can also affect transit duration, but 
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Figure 1 | Light curves of Kepler-11, raw and detrended. Kepler-11 is a G 
dwarf star with Kepler magnitude of 13.7, visual magnitude of 14.2 magnitudes, 
and celestial coordinates RA = 19h 48 min 27.62 s, dec. = +41° 54’ 32.9"; 
alternative designations used in catalogues are KIC 6541920 and KOI-157. 
Kepler-11 is about 2,000 light-years from Earth. Variations in the brightness of 
Kepler-11 have been monitored with an effective duty cycle of 91% over the 
time interval barycentric Julian date (BJD) 2,454,964.511 to 2,455,462.296, with 
data returned to Earth at a long cadence of 29.426 min. Shown are Kepler 
photometric data, raw from the spacecraft with each quarter normalized to its 
median (a) and after detrending with a polynomial filter (b)*’; t represents time 


the eccentricity needed to explain this duration for a central transit 
would destabilize the system). The transit durations of planets Kepler- 
11b and Kepler-11f suggest somewhat non-central transits. In sum, 
the light curve shapes imply that the system is not perfectly coplanar: 
Kepler-11g and Kepler-11e are mutually inclined by at least ~0.6°. 


Ground-based spectroscopy 

We performed a standard spectroscopic analysis''” of a high-resolu- 
tion spectrum of Kepler-11 taken at the Keck I telescope. We derive an 
effective temperature of T.g¢= 5,680 + 100K, surface gravity g of 
loglg (cm s *)]}=43+0.2, metallicity of [Fe/H] = 0.0 + 0.1 dex, 
and an projected stellar equatorial rotation of vsini = 0.4+0.5kms_'. 
Combining these measurements with stellar evolutionary tracks’** 
yields estimates of the star’s mass, 0.95+0.10Mo, and radius, 
1.1 + 0.1 Ro, where the subscript © signifies solar values. Estimates 
of the stellar density based upon transit observations are consistent 
with these spectroscopically determined parameters. Therefore, we 
adopt these stellar values for the rest of the paper, and note that the 
planet radii scale linearly with the stellar radius. Additional details on 
these studies are provided in section 2 of the Supplementary 
Information. 


11,12 


Transit timing variations 
Transits of a single planet on a Keplerian orbit about its star must be 
strictly periodic. In contrast, the gravitational interactions among planets 
in a multiple planet system cause planets to speed up and slow down by 
small amounts, leading to deviations from exact periodicity of transits'*"°. 
Such variations are strongest when planetary orbital periods are com- 
mensurate or nearly so, which is the case for the giant planets Kepler-9b 
and Kepler-9c (ref. 5), or when planets orbit close to one another, which is 
the case for the inner five transiting planets of Kepler-11. 

The transit times of all six planets are listed in Supplementary Table 
2. Deviations of these times from the best-fitting linear ephemeris 
(transit timing variations, or TTVs) are plotted in Fig. 3. We modelled 


54 | NATURE | VOL 470 | 3 FEBRUARY 2011 


in days since BJD 2,455,000. These data are available from the MAST archive at 
http://archive.stsciedu/kepler/. Note the difference in vertical scales between 
the two panels. The six sets of periodic transits are indicated with dots of 
differing colours. Four photometric data points representing the triply 
concurrent transit of planets Kepler-11b, Kepler-11d and Kepler-11le at 

BJD 2,455,435.2 (Supplementary Fig. 12) are not shown because their values lie 
below the plotted range. Data have also been returned for this target star at a 
cadence of 58.85 s since BJD 2,455,093.215, but our analysis is based exclusively 
on the long cadence data. 


these deviations with a system of coplanar, gravitationally interacting 
planets using numerical integrations”’” (Supplementary Information). 
The TTVs for each planet are dominated by the perturbations from its 
immediate neighbours (Supplementary Fig. 5). The relative periods 
and phases of each pair of planets, and to a lesser extent the small 
eccentricities, determine the shapes of the curves in Fig. 3; the mass 
of each perturber determines the amplitudes. Thus this TTV analysis 
allows us to estimate the masses of the inner five planets and to place 
constraints on their eccentricities. We report the main results in 
Table 1 and detailed fitting statistics in Supplementary Fig. 5 and 
associated text). 

Perturbations of planets Kepler-11d and Kepler-11f by planet 
Kepler-1le are clearly observed. These variations confirm that all 
three sets of transits are produced by planets orbiting the same star 
and yield a 4c detection of the mass of Kepler-11e. Somewhat weaker 
perturbations are observed in the opposite direction, yielding a 30 
detection of the mass of Kepler-11d and a 20 detection of the mass of 
Kepler-11f. 

The inner pair of observed planets, Kepler-11b and Kepler-11c, lie 
near a 5:4 orbital period resonance and strongly interact with one 
another. The degree to which they deviate from exact resonance deter- 
mines the frequency at which their TTVs should occur. Even though 
the precision of individual transit times is low owing to small transit 
depths, transit-timing periodograms of both planets show peak power 
at the expected frequency (Supplementary Fig. 4). The TTVs thus 
confirm that Kepler-11b and Kepler-11c are planets, confirm that they 
orbit the same star, and yield 2o determinations of their masses. The 
outer planet, Kepler-11g, does not strongly interact with the others; it 
would need to be unexpectedly massive (~1 Myupiter) to induce a 
detectable (Ay* = 9) signal on the entire set of transit mid-times. 


Validation of planet Kepler-11g 


The outer planetary candidate is well-separated from the inner five in 
orbital period, and its dynamical interactions are not manifested in the 
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Figure 2 | Detrended data of Fig. 1 shown phased at the period of each 
transit signal and zoomed to an 18-h region around mid-transit. 
Overlapping transits are not shown, nor were they used in the model. Each 
panel has an identical vertical scale, to show the relative depths, and identical 
horizontal scale, to show the relative durations. The colour of each planet’s 
model light curve matches the colour of the dots marking its transits in Fig. 1. 


data presently available. Thus, we only have a weak upper bound on its 
mass, and unlike the other five candidates, its planetary nature is not 
confirmed by dynamics. The signal (Fig. 2f) has the characteristics of a 
transiting planet and is far too large to have a non-negligible chance of 
being due to noise, but the possibility that it could be an astrophysical 
false-positive must be addressed. To obtain a Bayesian estimate of the 
probability that the events seen are due to a sixth planet transiting the 
star Kepler-11, we must compare estimates of the a priori likelihood of 
such a planet and ofa false positive. This is the same basic methodology 
as was used to validate planet Kepler-9d (ref. 6). 

We begin by using the BLENDER code’ to explore the wide range of 
false positives that might mimic the Kepler-11g signal, by modelling 
the light curve directly in terms of a blend scenario. The overwhelming 
majority of such configurations are excluded by BLENDER. We then 
use all other observational constraints to rule out additional blends, 
and we assess the a priori likelihood of the remaining false positives. 
Two classes of false positives were considered: (1) the probability of an 
eclipsing pair of objects that is physically associated with Kepler-11 
providing as good a fit to the Kepler data as provided by a planet 
transiting the primary star was found to be 0.31 X 10° °; (2) the prob- 
ability that a background eclipsing binary or star + planet pair yielding 
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Figure 3 | Transit timing variations and dynamical fits. Observed mid-times 
of planetary transits (see section 3 of the Supplementary Information for 
transit-fitting method and Supplementary Table 2 for transit times) minus a 
calculated linear ephemeris, are plotted as dots with lo error bars; colours 
correspond to the planetary transit signals in Figs 1 and 2. The times derived 
from the ‘circular fit’ model described in Supplementary Table 4 are given by 
the open diamonds. Contributions of individual planets to these variations are 
shown in Supplementary Fig. 6a. 


a signal of appropriate period, depth and shape could be present and 
not have been detected as a result of a centroid shift in the in-transit 
data, or other constraints from spectroscopy and photometry, was 
found to be 0.58 X 10°. Thus the total a priori probability of a signal 
mimicking a planetary transit is 0.89 X 10 °. There isa 0.510 °a 
priori probability of a transiting sixth planet in the mass—period 
domain. This value was conservatively estimated (not accounting for 
the coplanarity of the system; the value would increase by an order of 
magnitude if we were to assume an inclination distribution consistent 
with seeing transits of the five inner planets) using the observed dis- 
tribution of extrasolar planets'*”’. Details of these calculations are 
presented in section 5 of the Supplementary Information and 
Supplementary Figs 8-11. Taking the ratio of these probabilities yields 
a total false alarm probability of 1.8 X 107 *, which is small enough for 
us to consider Kepler-11g to be a validated planet. 


Long-term stability and coplanarity 

One of the most striking features about the Kepler-11 system is how 
close the orbits of the planets are to one another. From suites of numer- 
ical integrations”, dynamical survival of systems with more than three 
comparably spaced planets for at least 10'” orbits has been shown only 
if the spacing between orbital semi-major axes (a, — a;) exceeds a critical 
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Table 1 | Planet properties 


Planet Period Epoch Semi-major axis 


Inclination Transit duration Transit depth Radius Mass Density 


(days) (BJD) (AU) @) (h) (millimagnitude) (Re) (Ma) (gem™3) 
b 10.30375 +0.00016 2,454,971.5052 + 0.0077 0.091 + 0.003 88.5+52 4.02 + 0.08 0312001 1972019 4,3t82 Bite 
c 13.02502 + 0.00008 2,454,971.1748 + 0.0031 0.106 + 0.004 89.0+52 4.62 + 0.04 0.822001 3.152030 13:5728 2371 ‘i 
d 22.68719+0.00021 2,454,981.4550 + 0.0044 0.159 + 0.005 89.3406 5.58 + 0.06 0.80+0.02 3.434032 61 ae 0.9468 
e 31.99590 + 0.00028 2,454,987.1590 + 0.0037 0.194 + 0.007 88.8162 4.33 + 0.07 1.40+0.02 452+0.43 8.4493 O.5te2 
f 46.68876 + 0.00074 2,454,964.6487 + 0.0059 0.250 + 0.009 89.4493 6.54+0.14 0.55+002 2:61+0.25 23755 0.7707 
g 118.37774+0.00112 2,455,120.2901 + 0.0022 0.462 + 0.016 89.8163 9.60 + 0.13 1.15+003 3.66+0.35 <300 - 


Ro, radius of the Earth; Mj ,mass of the Earth. Planetary periods and transit epochs are the best-fitting linear ephemerides. Periods are given as viewed from the barycentre of our Solar System. Because Kepler-11 
is moving towards the Sun with a radial velocity of 57 kms! (Supplementary Fig. 1), actual orbital periods in the rest frame of Kepler-11 are a factor of 1.00019 times as long as the values quoted. Uncertainty in 
epoch is median absolute deviation of transit times from this ephemeris; uncertainty in period is this quantity divided by the number of orbits between the first and last observed transits. Radii are from 
Supplementary Table 2; uncertainties represent 1a ranges, and are dominated by uncertainties in the radius of the star. The mass estimates are the uncertainty-weighted means of the three dynamical fits 
(Supplementary Table 4) to TTV observations (Supplementary Table 2) and the quoted ranges cover the union of 1a ranges of these three fits. One of these fits constrains all of the planets to be on circular orbits, the 
second one allows only planets Kepler-11b and Kepler-11c to have eccentric orbits, and the third solves for the eccentricities of all five planets Kepler-11b to Kepler-11f; see section 4 of the Supplementary 
Information. Stability considerations may preclude masses near the upper ends of the quoted ranges for the closely spaced inner pair of planets. Inclinations are with respect to the plane of the sky. 


number (Aoi 29) of mutual Hill sphere radii ((M;+M,)/ 
3M)" (a; + a,)/2, where the subscripts i and o refer to the inner 
and outer planets, respectively, and * refers to the star (here Kepler- 
11). All of the observed pairs of planets satisfy this criterion, apart from 
the inner pair, Kepler-11b and Kepler-11c (section 4.1 of the Sup- 
plementary Information). These two planets are far enough from one 
another to be Hill stable in the absence of other bodies’ (that is, in the 
three-body problem), and they are distant enough from the other pla- 
nets that interactions between the subsystems are likely to be weak. 
Thus stability is possible, although by no means assured. So we inte- 
grated several systems that fit the data (given in Supplementary Table 4) 
for 2.5 X 10° years, as detailed in section 4.1 of the Supplementary 
Information. Weak chaos is evident both in the mean motions and 
the eccentricities (Supplementary Fig. 7), but the variations are at a 
low enough level to be consistent with long-term stability. 

It is also of interest to determine whether this planetary system 
truly is as nearly coplanar as the Solar System, or perhaps even more 
so. Given that the planets all transit the star, they individually must 
have nearly edge-on orbits. As discussed above and quantified in 
Supplementary Table 3, the duration of planet Kepler-11e’s transit 
implies an inclination to the plane of the sky of 88.8°, those of the two 
innermost planets suggest comparable inclinations, whereas those of 
the three other planets indicate values closer to 90°. But even though 
each of the planetary orbits is viewed nearly edge-on, they could be 
rotated around the line of sight and mutually inclined to each other. 
The more mutually inclined a given pair of planets is, the smaller the 
probability that multiple planets transit****. We therefore ran Monte 
Carlo simulations to assess the probability of a randomly positioned 
observer viewing transits of all five inner planets assuming that rela- 
tive planetary inclinations were drawn from a Rayleigh distribution 
about a randomly selected plane. The results, displayed in Fig. 4 and 
Supplementary Table 6, suggest a mean mutual inclination of 1-2°. 
Details of these calculations are provided in section 6 of the 
Supplementary Information. 

Mutual inclinations around the line of sight give rise to inclination 
changes, which would manifest themselves as transit duration 
changes”. We notice no such changes. The short baseline, small sig- 
nal-to-noise ratio and small planet masses render these dynamical 
constraints weak at the present time for all planets but Kepler-1le. 
The only planet in the system with an inclination differing signifi- 
cantly from 90° is Kepler-1le, and we find that the duration of its 
transits does not change by more than 2% over the time span of the 
light curve. If planet Kepler-11e’s orbit were rotated around the line of 
sight by just 2° compared to all the other components of the system, 
then with the masses listed in Table 1 the other planets would exert 
sufficient torque on its orbit to violate this limit. 


Planet compositions and formation 


Although the Kepler-11 planetary system is extraordinary, it also tells 
us much about the ordinary. Measuring both the radii and masses of 
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small planets is extremely difficult, especially for cooler worlds farther 
from their star that are not heated above 1,000 K. (Very high tempera- 
tures can physically alter planets, producing anomalous properties.) 
The planetary sizes obtained from transit depths and planetary masses 
from dynamical interactions together yield insight into planetary 
composition. 

Figure 5 plots radius as a function of mass for the five newly dis- 
covered planets the masses of which have been measured. Compared 
to Earth, each of these planets is large for its mass. Most of the volume 
of each of the planets Kepler-11c to Kepler-11f is occupied by low- 
density material. It is often useful to think of three classes of planetary 
materials, from relatively high to low density: rocks/metals, ‘ices’ 
dominated by H,O, CH, and NH3, and H/He gas. All of these compo- 
nents could have been accumulated directly from the protoplanetary 
disk during planet formation. Hydrogen and steam envelopes can also 
be the product of chemical reactions and out-gassing of rocky planets, 
but only up to 6% and 20% by mass, respectively”. In the Kepler-11 
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Figure 4 | Transit probabilities as a function of relative orbital inclinations 
of planets orbiting Kepler-11. Results of Monte Carlo simulations to assess 
the probability of a randomly positioned observer viewing transits of various 
combinations of observed and hypothesized planets around the star Kepler-11, 
assuming that relative planetary inclinations were drawn from a Rayleigh 
distribution about a randomly selected plane. The solid blue curve shows the 
probability of all five inner planets (Kepler-11b to Kepler-11f) to be seen 
transiting. The solid pink curve shows the probability of all six observed planets 
to be seen transiting. The ratio of the orbital period of planet Kepler-11g to that 
of Kepler-11fis substantially larger than that for any other neighbouring pair of 
transiting planets in the system. If we hypothesize that a seventh planet orbits 
between these objects with a period equal to the geometric mean of planets 
Kepler-11f and Kepler-11g, then the probability of observing transits of any 
combination totalling six of these seven planets is shown in the dashed golden 
curve. The dashed green curve shows the probability of the specific observed six 
planets to be seen transiting. Details of these calculations are provided in 
section 6 of the Supplementary Information, and numerical results are given in 
Supplementary Table 6. 
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Figure 5 | Mass-radius relationship of small transiting planets, with Solar 
System planets shown for comparison. Planets Kepler-11b to Kepler-11f are 
represented by filled circles with 1¢ error bars, with their letters written above; 
values and ranges are as given in Table 1. Other transiting extrasolar planets in 
this size range are shown as open squares, representing, in order of ascending 
radius, Kepler-10b, CoRoT-7b, GJ 1214b, Kepler-4b, GJ 436b and HAT-P-11b. 
The triangles (labelled V, E, U and N) correspond to Venus, Earth, Neptune and 
Uranus, respectively. The colours of the points show planetary temperatures 
(measured for planets in our Solar System, computed mean planet-wide 
equilibrium temperatures for Bond albedo = 0.2 for the extrasolar planets), 
with values shown on the colour scale on the right. Using previously 
implemented planetary structure and evolution models****, we plot mass- 
radius curves for 8-Gyr-old planets, assuming T. = 700 K. The solid black 
curve corresponds to models of planets with Earth-like rock-iron composition. 
The higher dashed curve corresponds to 100% H,O, using the SESAME 7154 
H,0 equation of state. All other curves use a water or H2/He envelope on top of 
the rock-iron core. The lower dashed curve is 50% HO by mass, while the 
dotted curves are H,/He envelopes that make up 2%, 6%, 10% and 20% of the 
total mass. There is significant degeneracy in composition constrained only by 
mass and radius measurements**. Planets Kepler-11d, Kepler-11e and Kepler- 
11f appear to require a H,/He envelope, much like Uranus and Neptune, while 
Kepler-11b and Kepler-11¢ may have H,O and/or H,/He envelopes. We note 
that multi-component and mixed compositions (not shown above), including 
rock/iron, HO, and H2/He, are expected and lead to even greater degeneracy in 
determining composition from mass and radius alone. 


system, the largest planets with measured masses, Kepler-11d and 
Kepler-1le, must contain large volumes of H, and low-mass planet 
Kepler-11f probably does as well. Planets Kepler-11b and Kepler-11c 
could be rich in ‘ices’ (probably in the fluid state, as in Uranus and 
Neptune) and/or a H/He mixture. (The error bars on mass and radius 
for Kepler-11b allow for the possibility of an iron-depleted nearly pure 
silicate composition, but we view this as highly unlikely on cosmogonic 
grounds.) In terms of mass, all five of these planets must be primarily 
composed of elements heavier than helium. Future atmospheric 
characterization to distinguish between H-dominated or steam atmo- 
spheres would tell us more about the planets’ bulk composition and 
atmospheric stability”®. 

Planets Kepler-11b and Kepler-11c have the largest bulk densities 
and would need the smallest mass fraction of hydrogen to fit their 
radii. Using an energy-limited escape model”, we estimate a hydrogen 
mass-loss rate of several times 10°gs | for each of the five inner 
planets, leading to the loss of ~0.1 Earth masses of hydrogen over 
10 Gyr. This is less than a factor of ten below total atmosphere loss for 
several of the planets. The modelling of hydrogen escape for strongly 
irradiated exoplanets is not yet well-constrained by observations*”’, 
so larger escape rates are possible. This suggests the scenario that 
planets Kepler-11b and Kepler-11c had larger H-dominated atmo- 
spheres in the past and lost these atmospheres during an earlier era 
when the planets had larger radii, lower bulk density, and a more 
active primary star, which would all favour higher mass-loss rates. 
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The comparative planetary science permitted by the planets in 
Kepler-11 system may allow for advances in understanding these 
mass-loss processes. 

The inner five observed planets of the Kepler-11 planetary system 
are quite densely packed dynamically, in that significantly closer 
orbits would not be stable for the billions of years that the star has 
resided on the main sequence. The eccentricities of these planets are 
small, and the inclinations very small. The planets are not locked into 
low-order mean motion resonances. 

Kepler-11 is a remarkable planetary system whose architecture and 
dynamics provide clues to its formation. The significant light-gas 
component of planets Kepler-11d, Kepler-1le and Kepler-11f imply 
that at least these three bodies formed before the gaseous component 
of their protoplanetary disk dispersed, probably taking no longer than 
a few million years to grow to near their present masses. The small 
eccentricities and inclinations of all five inner planets imply dissipa- 
tion during the late stages of the formation/migration process, which 
means that gas and/or numerous bodies much less massive than the 
current planets were present. The lack of strong orbital resonances 
argues against slow, convergent migration of the planets, which would 
lead to trapping in such configurations, although dissipative forces 
could have moved the inner pair of planets out from the nearby 5:4 
resonance”’. In situ formation would require a massive protoplanetary 
disk of solids near the star and/or trapping of small solid bodies whose 
orbits were decaying towards the star as a result of gas drag; it would 
also require accretion of significant amounts of gas by hot small rocky 
cores, which has not been demonstrated. (The temperature this close 
to the growing star would have been too high for ices to have con- 
densed.) The Kepler spacecraft is scheduled to continue to return data 
on the Kepler-11 planetary system for the remainder of its mission, 
and the longer temporal baseline afforded by these data will allow for 
more accurate measurements of the planets and their interactions. 
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Genomic structural variants (SVs) are abundant in humans, differing from other forms of variation in extent, origin and 
functional impact. Despite progress in SV characterization, the nucleotide resolution architecture of most SVs remains 
unknown. We constructed a map of unbalanced SVs (that is, copy number variants) based on whole genome DNA 
sequencing data from 185 human genomes, integrating evidence from complementary SV discovery approaches with 
extensive experimental validations. Our map encompassed 22,025 deletions and 6,000 additional SVs, including 
insertions and tandem duplications. Most SVs (53%) were mapped to nucleotide resolution, which facilitated 
analysing their origin and functional impact. We examined numerous whole and partial gene deletions with a 
genotyping approach and observed a depletion of gene disruptions amongst high frequency deletions. Furthermore, 
we observed differences in the size spectra of SVs originating from distinct formation mechanisms, and constructed a 
map of SV hotspots formed by common mechanisms. Our analytical framework and SV map serves as a resource for 


sequencing-based association studies. 


Introduction 


Unbalanced structural variants (SVs), or copy number variants 
(CNVs), involving large-scale deletions, duplications and insertions 
form one of the least well studied classes of genetic variation. The 
fraction of the genome affected by SVs is comparatively larger than 
that accounted for by single nucleotide polymorphisms’ (SNPs), 
implying significant consequences of SVs on phenotypic variation. 
SVs have already been associated with diverse diseases, including 
autism**, schizophrenia** and Crohn’s disease*’. Furthermore, 
locus-specific studies suggest that diverse mechanisms may form 
SVs de novo, with some mechanisms involving complex rearrange- 
ments resulting in multiple chromosomal breakpoints*’. 

Initial microarray-based SV surveys focused on large gains and 
losses®*, with recent advances in array technology widening the 
accessible size spectrum towards smaller SVs". Microarray-based 
surveys commonly mapped SVs to approximate genomic locations. 
However, a detailed SV characterization, including analyses of SV 


origin and impact, requires knowledge of precise SV sequences. 
Advances in sequencing technology have enabled applying 
sequence-based approaches for mapping SVs at a fine scale’*~’. 
These approaches include: (1) paired-end mapping (or read pair 
‘RP’ analysis) based on sequencing and analysis of abnormally map- 
ping pairs of clone ends'**** or high-throughput sequencing frag- 
ments’*’”'*; (2) read-depth (‘RD’) analysis, which detects SVs by 
analysing the read depth-of-coverage’®*!”*”; (3) split-read (‘SR’) 
analysis, which evaluates gapped sequence alignments for SV detec- 
tion**”; and (4) sequence assembly (‘AS’), which enables the fine- 
scale discovery of SVs, including novel (non-reference) sequence 
insertions*® **. Sequence-based SV discovery approaches have previ- 
ously been applied to a limited (<20) number of genomes, leaving the 
fine-scale architecture of most common SVs unknown. 

Sequence data generated by the 1000 Genomes Project (1000GP) 
provide an unprecedented opportunity to generate a comprehensive 
SV map. The 1000GP recently generated 4.1 terabases of raw sequence 
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in two pilot projects targeting whole human genomes” (Supplemen- 
tary Table 1). These studies comprise a population-scale project, 
termed ‘low-coverage project’, in which 179 unrelated individuals 
were sequenced with an average coverage of 3.6X, including 59 
Yoruba individuals from Nigeria (YRI), 60 individuals of European 
ancestry from Utah (CEU), 30 of Han ancestry from Beijing (CHB), 
and 30 of Japanese ancestry from Tokyo (JPT; the latter two were 
jointly analysed as JPT+CHB). In addition, a high-coverage project, 
termed the ‘trio project’, was carried out, with individuals of a CEU 
anda YRI parent-offspring trio sequenced to 42 X coverage on average. 

We report here the results of analyses undertaken by the Structural 
Variation Analysis Group of the 1000GP. The group’s objectives were 
to discover, assemble, genotype and validate SVs of 50 base pairs (bp) 
and larger in size, and to assess and compare different sequence-based 
SV detection approaches. The focus of the group was initially on 
deletions, a variant class often associated with disease’, for which rich 
control data sets and diverse ascertainment approaches exist’*?*”*. 
Less focus was placed on insertions and duplications** and none on 
balanced SV forms (such as inversions). Specifically, we applied nine- 
teen methods to generate an SV discovery set. We further generated 
reference genotypes for most deletions, assessed the SVs’ functional 
impact and stratified SV formation mechanism with respect to variant 
size and genomic context. 


Assessment of SV discovery methods 

We incorporated the SV discovery methods into a pipeline (Fig. 1a, b), 
with the goal of ascertaining different SV types and assessing each 
method for its ability to discover SVs. The methods detected SVs by 
analysing RD, RP, SR and AS features, or by combining RP and RD 
features (abbreviated as ‘PD’). Altogether we generated 36 SV call-sets 
by applying the methods on trio and low-coverage whole genome 
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Figure 1 | SV discovery and genotyping in population scale sequence data. 
a, Schematic depicting the different modes (that is, approaches) of sequence- 
based SV detection we used. The RP approach assesses the orientation and 
spacing of the mapped reads of paired-end sequences’*”* (reads are denoted by 
arrows); the RD approach evaluates the read depth-of-coverage”*”®; the SR 
approach maps the boundaries (breakpoints) of SVs by sequence alignment™ ae 
the AS approach assembles SVs*”. b, Integrated pipeline for SV discovery, 
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sequence data, and by identifying SVs as genomic differences relative 
to a human reference, corresponding to the reference genome, or to a set 
of individuals (that is, population reference; Supplementary Table 2). 
We initially identified SVs as deletions, tandem duplications, novel 
sequence insertions and mobile element insertions (MEIs) relative to 
the human reference. Subsequent comparative analyses involving prim- 
ate genomes enabled us to classify SVs as deletions, duplications, or 
insertions relative to inferred ancestral genomic loci, reflecting mechan- 
isms of SV formation (see below). DNA reads analysed by SV discovery 
methods were initially mapped to the human reference genome using a 
variety of alignment algorithms. Most of these algorithms mapped each 
read to a single genomic position, although one algorithm (mrFAST"®) 
also considered alternative mapping positions for reads aligning to 
repetitive regions (see Supplementary Tables 2-4 for method-specific 
parameters and full SV call-sets). We filtered each call-set by excluding 
SVs <50 bp, which are reported elsewhere*’. Many SVs showed support 
from distinct SV discovery methods, as exemplified by a common dele- 
tion, previously associated with body-mass index*®* (BMI), that we iden- 
tified with RP, RD and SR methods (Fig. 1c). Nonetheless, we observed 
notable differences between methods (Fig. 2a—c) in terms of genomic 
regions ascertained (Supplementary Fig. 1), accessible SV size-range 
(Fig. 2a), and breakpoint precision (Fig. 2c, Supplementary Fig. 2). 

To estimate call-set specificity, we carried out extensive validations 
(Methods), including PCRs for over 3,000 candidate loci and micro- 
array data analyses for 50,000 candidate loci (Supplementary Tables 3, 
4 and Supplementary Fig. 3). We combined PCR and array-based 
analysis results to estimate false discovery rates (FDRs), and found 
that eight call-sets (three deletion, one tandem duplication and four 
insertion call-sets) met the pre-specified specificity threshold* 
(FDR = 10%), whereas the other call-sets yielded lower specificity 
(FDRs of 13-89%). 


Application of diverse SV discovery methods 
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validation and genotyping. Coloured circles represent individual SV discovery 
methods (listed in Supplementary Table 2), with modes indicated by a colour 
scheme: green, RP; yellow, RD; purple, SR; red, AS; green and yellow, methods 
evaluating RP and RD (abbreviated as ‘PD’). c, Example of a deletion, previously 
associated with BMI”, identified independently with RP (green), RD (yellow) and 
SR (red) methods. Grey dots indicate position and mapping quality for individual 
sequence reads. Targeted assembly confirmed the breakpoints detected by SR. 
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Figure 2 | Comparative assessment of deletion discovery methods. 

a, Deletion size-range ascertained by different modes of SV discovery. Three 
groups are visible, with AS and SR, PD and RP, as well as RD and ‘RL’ (RP 
analysis involving relatively long range (=1 kb) insert size libraries, resulting in 
a different deletion detection size range compared to the predominantly used 
<500 bp insert size libraries), respectively, ascertaining similar size-ranges. Pie 
charts display the contribution (%) of different SV discovery modes to the 
release set. Outer pie is based on the number of SV calls; inner pie is based on the 
total number of variable nucleotides. Of note, not all approaches were applied 
across all individuals (see Supplementary Table 2). b, Sensitivity and FDR 
estimates for individual deletion discovery methods based on gold standard sets 
for individuals sequenced at high (NA12878) and low-coverage (NA12156), 
respectively. All depicted estimates are summarized in Supplementary Tables 3, 
4 and 6. Vertical dotted lines correspond to the specificity threshold 

(FDR = 10%). ¢, Breakpoint mapping resolution of three deletion discovery 
methods (the respective method names are in Supplementary Table 2). The blue 
and red histograms are the breakpoint residuals for predicted deletion start and 
end coordinates, respectively, relative to assembled coordinates (here assessed 
in low-coverage data). The horizontal lines at the top of each plot mark the 98% 
confidence intervals (labelled for each panel), with vertical notches indicating 
the positions of the most probable breakpoint (the distribution mode). 


We assessed the sensitivity of deletion discovery methods further 
by collating data from four earlier surveys'’*’*”* into a gold standard 
(Methods, Supplementary Tables 5, 6 and Supplementary Fig. 4a), 
and specifically assessing the detection sensitivity for an individual 
sequenced at high-coverage (NA12878) as well as for an individual 
sequenced at low-coverage (NA12156). Unsurprisingly, given the 
typical trade-off between sensitivity and specificity, in the trios the 
highest sensitivities were achieved by RD and RP methods with 
FDR > 10% (Fig. 2b). By comparison, in the low-coverage data, the 
individual method with the greatest accuracy (FDR = 3.7%) was the 
second most sensitive based on our gold standard (Fig. 2b), and the 
most sensitive when expanding the gold standard to a larger set of 
individuals (Supplementary Fig. 4b). This method, Genome STRiP (to 
be described elsewhere; Handsaker, R. E., Korn, J. M., Nemesh, J. and 
McCarroll, S. A., unpublished results), integrated both RP and RD 
features (PD), implying that considering different evidence types can 
improve SV discovery. 


Construction of our SV discovery set 


To construct our SV discovery set (‘release set’), we joined calls from 
different discovery methods corresponding to the same SV with a 


Table 1 | Summary of discovered structural variation 
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merging approach that was aware of each call-set’s precision in SV 
breakpoint detection (Supplementary Fig. 5 and Methods). Most SVs 
in the release set (61%) were contributed by individual methods meet- 
ing the pre-defined specificity threshold (FDR = 10%). The remain- 
ing 39% of calls were contributed by lower specificity methods 
following experimental validation. Altogether, the release set com- 
prised 22,025 deletions, 501 tandem duplications, 5,371 MEIs and 
128 non-reference insertions (Table 1, Supplementary Table 7). 
With our gold standard we estimated an overall sensitivity of deletion 
discovery of 82% in the trios, and 69% in low-coverage sequence 
(Fig. 2b) using a 1-bp overlap criterion. When instead applying a 
stringent 50% reciprocal overlap criterion for sensitivity assessment 
(which required SV sizes inferred on different experimental platforms 
to be in close agreement), our sensitivity estimates decreased by 12% 
and 18%, respectively, in trio and low-coverage sequence (Supplemen- 
tary Table 8). We examined further an alternative SV discovery 
approach that involved the pairwise integration of deletion discovery 
methods, and tested its ability to discover SVs without relying on the 
inclusion of lower specificity calls following experimental validation 
(this approach resulted in the generation of the ‘algorithm-centric set’; 
Fig. 1b). Whereas this alternative approach resulted in an increased 
number (by ~ 13%) of high-specificity (FDR < 10%) calls compared to 
the release set (Supplementary Text), overall it resulted in fewer SV 
calls owing to its decreased sensitivity at the lower (<200 bp) SV size 
range. In the following analyses we thus focused on the release set. 


Extent and impact of our SV discovery set 


We next assessed the extent and impact of our SV discovery (release) 
set. The median SV size was 729 bp (mean = 8 kilobases), approxi- 
mately four times smaller than in a recent tiling CGH-based study’, 
reflecting the high resolution of DNA-sequence-based SV discovery. 
We also compared our set to a recent survey of SVs in an individual 
genome* based on capillary sequencing and array-based analyses”, 
and observed a similar size distribution for deletions, but differences 
in the size distributions of other SV classes, reflecting underlying 
differences in SV ascertainment (Supplementary Fig. 6). By comparing 
our SVs to databases of structural variation and to additional personal 
genome data sets, we classified 15,556 SVs in our set as novel, with an 
enrichment of low frequency SVs and small SVs amongst the novel 
variants (Methods and Supplementary Text). 

A major advantage of sequence-based SV discovery is the nucleo- 
tide resolution mapping of SVs. We initially mapped the breakpoints 
of 7,066 deletions and 3,299 MEIs using SR and AS features. Using the 
TIGRA-targeted assembly approach (Chen, L. et al., unpublished 
results) we further identified the breakpoints of an additional 4,188 
deletions and 160 tandem duplications, initially discovered by RD, RP 
and PD methods (Methods, Supplementary Tables 3, 4). Altogether, 
we mapped ~ 15,000 SVs at nucleotide resolution, 48% of which were 
novel. Few deletion loci (4.4%) displayed different SV breakpoints in 
different samples, which is explainable by rare TIGRA misassemblies, 
or alternatively, by recurrently formed, multi-allelic SVs (Supplemen- 
tary Text). TIGRA further enabled us to validate an additional 7,359 
SVs by identifying the SVs’ breakpoints (Methods), and to evaluate 
the mapping precision of SV discovery methods (Fig. 2c, Supplemen- 
tary Fig. 2). 

Weassessed the putative functional impact of SVs in our set further 
by relating them to genomic annotation. Many SVs (1,775) affected 
coding sequences, resulting in full gene overlaps or exon disruptions 
(Table 2), many of which led to out-of-frame exons (Supplementary 


Deletions Tandem duplications Mobile element insertions Novel sequence insertions Total 
Individual call-sets < 10% FDR 11,215 501 5,371 = 17,087 
Validated experimentally* 10,810 - - 128 10,938 
Release set 22,025 501 5,371 128 28,025 


* Only tabulates validated calls which were not already present in individual call-sets with less than 10% FDR. 
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Table 2 | Functional impact of our fine resolution SV set 


SV class Gene overlap Total gene overlap Total intergenic 
Full gene overlap Coding exon affected, UTR overlap Intron overlap 
partial 

Deletions 654 (631) 1,093 (1,031) 315 (290) 7,319 (6,481) 9,381 (8,433) 12,644 (10,386) 
Tandem duplications 2 (2) 7 (6) 9 (5) 197 (62) 215 (75) 286 (76) 
Mobile element insertions = 3 (2) 36 (26) 1,304 (97) 1,348 (112) 4,023 (758) 
Novel sequence insertions = = 2 (2) 49 (49) 51 (61) 77 (77) 
Sum 656 (633) 1,119 (1,040) 351 (309) 8,869 (6,689) 10,995 (8,671) 17,030 (11,280) 


Figures in parentheses indicate numbers of validated SVs per category. We inferred gene overlap with Gencode gene annotation”. 


Table 9). We related gene disruptions to gene functions, and observed 
significant enrichments for several functional categories, including 
cell defence and sensory perception (Supplementary Table 10). 
High levels of structural variation, including copy number variation, 
were described previously for both processes'***’’. These SVs might 
be maintained in the population by selection for the purpose of func- 
tional redundancy. Whereas most SVs intersecting with genes were 
deletions, several validated tandem duplications and MEIs also inter- 
sected with coding sequences (Table 2). 


Population genetic properties of deletions 

We next sought to generate genotypes for deletions discovered in the 
1000GP data, both to facilitate population genetics analyses and to 
make our SV set amenable to association studies in the form of a 
reference genotype set. In this regard, the Genome STRiP genotyping 
method was developed (Handsaker, R. E., Korn, J. M., Nemesh, J. and 
McCarroll, S. A., unpublished results), a method combining informa- 
tion from RD, RP, SR and haplotype features of population-scale 
sequence data for genotyping (Methods, Supplementary Text). Using 
this approach we generated genotypes for 13,826 autosomal deletions 
in 156 individuals. The genotypes displayed 99.1% concordance with 
CGH array-based’ genotypes (available for 1,970 of the deletions), 
indicating high genotyping accuracy. 

Figure 3 presents allele frequency analyses based on these geno- 
types. As expected, common polymorphisms (minor allele frequency 
(MAF) > 5%) were typically shared across populations, whereas rare 
alleles were frequently observed in only one population (Fig. 3a-c). 
We observed several candidates for monomorphic deletions (that is, 
genomic segments putatively deleted in all individuals), explainable 
by rare insertions present in the reference genome or by remaining 
genotyping inaccuracies (Supplementary Text). 
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Figure 3 | Analysis of deletion presence and absence in three populations. 
a-c, Deletion allele frequencies and observed sharing of alleles across 
populations, displayed for deletions discovered in the CEU (a), YRI (b) and 
JPT+CHB (c) population samples in terms of stacked bars. d, Allele frequency 
spectra for deletions intersecting with intergenic (blue), intronic (yellow) and 
protein-coding sequences (red). 
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Next we assessed the allele frequencies of gene deletions. Similar to a 
recent array-based study’, we observed a depletion of high-frequency 
alleles among deletions intersecting with protein-coding sequence com- 
pared to other deletions (P = 2.2 X 10— 16.KS test), consistent with puri- 
fying selection keeping most gene deletions at low frequency (Fig. 3d). 
Nonetheless, several coding sequence deletions were observed with high 
allele frequency (>80%). Most of these occurred in regions annotated as 
segmental duplications, consistent with lessened evolutionary constraint 
in functionally redundant gene categories”. Intriguingly, common gene 
deletions also affected many unique genes with no obvious paralogues. 
We further analysed the abundance of gene deletions in different popu- 
lations and observed highly differentiated loci, albeit with no statistically 
significant relationship between differentiation and particular categories 
of gene overlap, that is, intronic versus exonic (Supplementary Text). 

By comparing deletion genotypes with genotypes of nearby SNPs, 
we found, consistent with earlier studies’"***, that deletions in geno- 
mic regions accessible to short read sequencing display extensive 
linkage disequilibrium (LD) with SNPs. Most common deletions 
(81%) had one or more SNPs with which they are strongly correlated 
(7° > 0.8; Supplementary Fig. 7). This indicates that many deletions 
mapped in our study will be identifiable through tagging SNPs in 
future studies (Supplementary Text). On the other hand, a fifth of 
the genotyped deletions were not tagged by HapMap SNPs (a figure 
similar to the fraction of SNPs that are not tagged by HapMap 
SNPs*’), implying that these SVs should be genotyped directly in 
association studies. Furthermore, the LD properties of complex SVs 
(for example, multiallelic SVs) have not yet been fully ascertained as 
methods for genotyping such SVs with similar accuracy are still being 
developed. 


SV formation mechanism analysis 


Nucleotide resolution breakpoint information enables inference of SV 
formation mechanisms’*”*. Recent studies broadly distinguished 
between several germline rearrangement classes, some of which 
may comprise more than one SV formation mechanism'*?**™: non- 
allelic homologous recombination (NAHR), associated with long 
sequence similarity stretches around the breakpoints; rearrangements 
in the absence of extended sequence similarity (abbreviated as ‘non- 
homologous’ or NH), associated with DNA repair by non-homologous 
end-joining (NHE)) or with microhomology-mediated break-induced 
replication (MMBIR); the shrinking or expansion of variable number 
of tandem repeats (VNTRs), frequently involving simple sequences, by 
slippage; and MEIs. We distinguished among the classes NAHR, NH, 
VNTR and MEI by examining the breakpoint junction sequences of 
SVs that had initially been discovered as deletions or tandem duplica- 
tions relative to a human reference. 

We first compared these SVs to orthologous primate genomic 
regions to distinguish deletions from insertions/duplications with 
respect to reconstructed ancestral loci using the BreakSeq classifica- 
tion approach*’. This analysis showed that of the 11,254 nucleotide- 
resolution SVs discovered as deletions relative to a human reference, 
21% actually represented insertions and 2% represented tandem 
duplications relative to the putative ancestral genome. Of the remain- 
ing SVs, 60% were classified as deletions relative to ancestral sequence, 
whereas the ancestral state of 17% was undetermined. By comparison, 
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out of 160 nucleotide-resolution SVs identified as tandem duplica- 
tions relative to the reference genome, 91.6% were classified as dupli- 
cations relative to the ancestral genome, whereas the ancestral state of 
8.4% remained undetermined (Supplementary Text). Our breakpoint 
analysis revealed that 70.8% of the deletions and 89.6% of the inser- 
tions exhibited breakpoint microhomology/homology ranging from 
2-376 bp in size, with distribution modes of 2 bp (attributable to NH) 
and 15 bp (attributable to MEI), respectively (Fig. 4a, Supplementary 
Text). As expected*’, a small portion of the deletions (16.1%) dis- 
played non-template inserted sequences at their breakpoint junctions. 
By comparison, the tandem duplications showed extensive stretches 
displaying =95% sequence identity at the breakpoints linearly correl- 
ating in length with SV size (Fig. 4a). In addition, most tandem dupli- 
cations displayed 2-17bp of microhomology at the breakpoint 
junctions (Supplementary Text). 

We subsequently applied BreakSeq*! to infer formation mechan- 
isms for all SVs classified with regard to ancestral state. Using 
BreakSeq, we inferred NH as the dominating deletion mechanism, 
and MEI as the dominating insertion mechanism (Fig. 4b, c and Sup- 
plementary Table 11). Furthermore, an abundance of microhomology 
at tandem duplication breakpoints suggested frequent formation of 
this SV class by a rearrangement process acting in the absence of 
homology (NH). When relating SV formation to the variant size 
spectrum, we observed marked insertion peaks for MEIs at 300 bp, 
corresponding to Alu elements, and at 6 kb, corresponding to the L1 
class of long interspersed elements (LINEs) (Fig. 4c). By comparison, 
NH- and NAHR-based mechanisms occurred across a wide size- 
range, whereas VNTR expansion/shrinkage, consistent with earlier 
findings’, led to relatively small SV sizes (Fig. 4c, d). 

Furthermore, when displaying the genomic distribution of SVs 
(Fig. 5a), we observed a notable clustering of SVs into ‘SV hotspots’. 
We analysed this clustering in detail by examining the distribution of 
non-overlapping, adjacent SVs, and observed a marked clustering of 
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SVs formed by NAHR, VNTR and NH, respectively, a signal extend- 
ing to hundreds of kilobases (Fig. 5b). The clustering was influenced 
by an abundance of VNTR near the centromeres*' and NAHR near 
the telomeres (Fig. 5a). A significant enrichment of NAHR near 
recombination hotspots (P = 1.3 X 107'°) and segmental duplica- 
tions (P= 3.1 X 10 |”) further contributed to the clustering (Sup- 
plementary Table 13). 

To further explore this clustering we devised a segmentation 
approach for predicting SV hotspots (Methods), which yielded a map 
of 51 putative SV hotspots (Supplementary Table 14). Most of the hot- 
spots (80%) mainly comprised SVs originating from a single formation 
mechanism (Fig. 5c). Most of these corresponded to NAHR hotspots, 
although hotspots dominated by NH and VNTR were also evident. 
These observations indicate that SV formation is frequently associated 
with the locus-specific propensity for genomic rearrangement. 


Conclusions and discussion 


By generating an SV set of unprecedented size along with breakpoint 
assemblies and reference genotypes, we demonstrate the suitability of 
population-scale sequencing for SV analysis. Nucleotide resolution 
data allow the construction of reference data sets and make SVs readily 
assessable across different experimental platforms using genotyping 
approaches. Our fine-scale map enabled us to examine the functional 
impact of SVs, as exemplified by the set of gene disruption variants we 
reported, which will be of value for genome and exome sequencing 
studies. 

Our map further enabled us to examine size spectra of SV forma- 
tion mechanisms and led us to identify genomic SV hotspots that are 
commonly dominated by a single formation mechanism. Recurrent 
rearrangements, implicated in genomic disorders, are hypothesized to 
be associated with local genome architecture”, for example, with 
segmental duplications that facilitate NAHR. Also, DNA rearrange- 
ment in the absence of homology, that is, MMBIR, has been implicated 
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Figure 4 | Contribution of SV formation mechanisms to the SV size 
spectrum. a, Breakpoint junction homology/microhomology length plotted as 
a function of SV size for SVs originally identified as deletions compared to a 
human reference. Dots are coloured according to the SVs’ classification as 
deletions, insertions/duplications, or ‘undetermined’ relative to inferred 
ancestral genomic loci. Gray lines mark groups of SVs likely formed by a 
common formation mechanism. The diagonal highlights tandem duplications 
(and few reciprocal deletion events), in which the length of the duplicated 
sequence correlates linearly with the length of the longest breakpoint junction 
sequence identity stretch. The ellipses indicate MEIs, that is, Alu (~300 bp) and 
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L1 (~6kb) insertions, associated with target site duplications of up to 28 bp in 
size at the breakpoints. The horizontal group corresponds mostly to NH- 
associated deletions with <10 bp microhomology at the breakpoints. The 
remaining (ungrouped) SVs comprise truncated MEIs, VNTR expansion and 
shrinkage events, as well as NAHR-associated deletions and duplications. 

b, Relative contributions of SV formation mechanisms in the genome. 
Numbers of SVs are displayed on the outer pie chart and affected base pairs on 
the inner. Left panel, SVs classified as deletions relative to ancestral loci. Right 
panel, SVs classified as insertions/duplications. c, Size spectra of deletions 
classified relative to ancestral loci. d, Size spectra of insertions/duplications. 
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Figure 5 | Mapping hotspots of SV formation in the genome. a, Distribution 
of SVs on chromosome 10 (‘chr10’). Above the ideogram, coloured bars 
indicate SV formation mechanisms (same colour scheme as in (b) and (c)); bar 
lengths relate to the logarithm of SV size. Below the ideogram, bar lengths are 
directly proportional to allele frequencies. Arrows indicate an SV hotspot near 
the centromere underlying mainly VNTR and several hotspots near the 
telomeres underlying mainly NAHR events. b, Enrichment of SVs inferred to 
be formed by the same formation mechanism for different genomic window 
sizes. Displayed is an enrichment of nearby, non-overlapping SVs formed by 
the same mechanism relative to an SV set where mechanism assignments are 
shuffled randomly. c, SV hotspots are mostly dominated by a single formation 
mechanism. Coloured bars depict numbers of SV hotspots in which at least 50% 
of the variants were inferred to be formed by a single formation mechanism. 
The average abundance of NAHR-classified SVs in NAHR hotspots was 70% 
(compared with 77% for VNTR-hotspots; 69% for NH). The grey bar (‘mixed’) 
corresponds to SV hotspots with no single mechanism dominating. 


in recurrent SV formation®”. In this regard, we noticed that out of the 
hotspots we report, six fall into critical regions of known genetic dis- 
orders associated with recurrent de novo deletions, including Miller- 
Dieker syndrome and Leri-Weill dyschondrosteosis (Supplementary 
Table 14). Irrespective of potential disease relevance, or inferred mech- 
anism of formation, our analysis revealed a map of SV hotspots that 
may constitute local centres of de novo SV formation, consistent with 
the concept that local genome architecture contributes to genomic 
instability*. 

Our study focused on characterizing deletions, which are often 
associated with disease’. Facilitated by ancestral analyses of SV loci, 
we also characterized insertions and tandem duplications, albeit in 
less detail than deletions. Companion papers with more detailed ana- 
lyses of MEIs and copy number variation within segmental duplica- 
tions are published elsewhere (Stewart, C. et al., unpublished results, 
and ref. 34). Of note, most SV discovery methods depend on mapping 
reads onto their genomic locus of origin, that is, the ‘accessible’ frac- 
tion of the genome, a fraction lessened in segmental duplications that 
are of high interest to SV analysis. Nonetheless, owing to the abilities 
of SV discovery methods in detecting SVs in these regions and in 
interpreting reads with multiple mapping positions, the ‘accessible’ 
fraction of the genome is higher for SVs than for SNPs"*. In the future, 
sequencing technologies generating longer DNA reads will increase the 
accessible genome, and will enable the assessment of SVs embedded in 
long repeat structures, such as balanced inversions. 

Our SV resource will enable the discovery, genotyping and imputa- 
tion of SVs in larger cohorts. Numerous genomes will be sequenced in 
the coming months to facilitate disease association studies. Systematic 
characterization of SVs in these genomes will benefit from the con- 
cepts and data sets presented here. 


METHODS SUMMARY 

Samples. Whole genome sequencing data for 179 unrelated individuals and six 
individuals from parent-offspring trios were obtained as part of the 1000GP. 
These data were generated with Illumina/Solexa, Roche/454 and _ Life 
Technologies/SOLiD sequencing technology platforms. 

SV discovery and breakpoint assembly. The SV discovery methods we applied 
comprised six RP, four RD, three SR, four AS, and two PD based methods. TIGRA 
(Chen, L. et al., unpublished results) was used for targeted breakpoint assembly. 
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Experimental validation. We validated SV calls by PCR, array CGH and SNP 
microarrays, targeted assembly, and custom microarray-based sequence capture. 
PCR was performed in various different laboratories**, CGH analysis was per- 
formed based on tiling array data provided by the Genome Structural Variation 
Consortium (ArrayExpress: E-MTAB-40), and SNP array analysis based on data 
obtained from the International HapMap Consortium (http://hapmap.ncbi.nlm. 
nih.gov). 

Genotyping. Genome STRiP (Handsaker, R. E., Korn, J. M., Nemesh, J. and 
McCarroll, S. A., unpublished results) was used for deletion genotyping in low- 
coverage sequence data. Initial genotype likelihoods were derived with a Bayesian 
model and imputation into a SNP genotype reference panel from the HapMap” 
(Hapmap3r2) was achieved with Beagle (v3.1; http://faculty.washington.edu/ 
browning/beagle/beagle.html). 

SV formation mechanism analysis. SV breakpoints mapped at nucleotide reso- 
lution were analysed with BreakSeq’' to classify SVs relative to putative ancestral 
loci and to infer SV formation mechanisms. SV hotspots were mapped with 
custom Perl and R scripts. 
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An actively accreting massive black hole in the dwarf 
starburst galaxy Henize 2-10 


Amy E. Reines', Gregory R. Sivakoff', Kelsey E. Johnson’? & Crystal L. Brogan? 


Supermassive black holes are now thought to lie at the heart of every 
giant galaxy with a spheroidal component, including our own Milky 
Way’”. The birth and growth of the first ‘seed’ black holes in the 
earlier Universe, however, is observationally unconstrained’ and we 
are only beginning to piece together a scenario for their subsequent 
evolution’. Here we report that the nearby dwarf starburst galaxy 
Henize 2-10 (refs 5 and 6) contains a compact radio source at the 
dynamical centre of the galaxy that is spatially coincident with a 
hard X-ray source. From these observations, we conclude that 
Henize 2-10 harbours an actively accreting central black hole with 
a mass of approximately one million solar masses. This nearby 
dwarf galaxy, simultaneously hosting a massive black hole and an 
extreme burst of star formation, is analogous in many ways to 
galaxies in the infant Universe during the early stages of black-hole 
growth and galaxy mass assembly. Our results confirm that nearby 
star-forming dwarf galaxies can indeed form massive black holes, 
and that by implication so can their primordial counterparts. 
Moreover, the lack of a substantial spheroidal component in 
Henize 2-10 indicates that supermassive black-hole growth may 
precede the build-up of galaxy spheroids. 

The starburst in Henize 2-10, a relatively nearby (9 megaparsecs, 
~30 million light years) blue compact dwarf galaxy, has attracted 
the attention of astronomers for decades®”’. Stars are forming in 
Henize 2-10 at a prodigious rate*''’* that is ten times that of the 
Large Magellanic Cloud”? (a satellite galaxy of the Milky Way), despite 
the fact that both of these dwarf galaxies have similar stellar masses'*"° 
and neutral hydrogen gas masses”'’. Most of the star formation in 
Henize 2-10 is concentrated in a large population of very massive 
and dense ‘super-star clusters’, the youngest having ages of a few 
million years and masses of one hundred thousand times the mass 
of the Sun’. The main optical body of the galaxy has an extent less than 
a kiloparsec (~3,000 light-years) in size and has a compact irregular 
morphology typical of blue compact dwarfs (Fig. 1). 

We observed Henize 2-10 at centimetre radio wavelengths with the 
Very Large Array (VLA) and in the near-infrared with the Hubble Space 
Telescope (HST) as part of a large-scale panchromatic study of nearby 
dwarf starburst galaxies harbouring infant super-star clusters'**°. A 
comparison between the VLA and HST observations drew our attention 
to a compact (<24 pc X 9 pc) central radio source located between two 
bright regions of ionized gas (Fig. 2). These data exclude any associa- 
tion of this central radio source with a visible stellar cluster (Fig. 3; 
see Supplementary Information for a discussion of the astrometry). 
Furthermore, the radio emission from this source has a significant 
non-thermal component (a ~ —0.4, S, « v* where S, is the flux density 
at frequency v) between 4.9GHz and 8.5 GHz, as noted in previous 
studies of the galaxy’. An archival observation of Henize 2-10 taken 
with the Chandra X-ray Observatory reveals that a point source with 
hard X-ray emission is also coincident (to within the position uncer- 
tainty) with the central non-thermal radio source’® (see Supplementary 
Information). Typically, even powerful non-nuclear radio and X-ray 
sources (for example, supernova remnants and active X-ray binaries) 


are at least an order of magnitude less luminous than the central source 
in Henize 2-10 (see Supplementary Information). In contrast, the radio 
and hard X-ray luminosities of the central source in Henize 2-10, as well 
as their ratio, are similar to known low-luminosity active galactic nuclei 
powered by accretion onto a massive black hole”. 

The central, compact, non-thermal radio source in Henize 2-10 is 
also coincident with a local peak in Pax, and Ha emission and appears to 
be connected to a thin quasi-linear ionized structure between two bright 
and extended regions of ionized gas. This morphology is tantalizingly 
suggestive of outflow (Fig. 2). Although we cannot conclusively deter- 
mine whether or not this linear structure is physically connected to the 
brightest emitting regions with the data in hand, ground-based spec- 
troscopic observations” confirm a coherent velocity gradient along the 
entire ionized gas structure seen in Fig. 2, consistent with outflow or 
rotation. Moreover, a comparison between the central velocity of this 
ionized gas structure and the systemic velocity of the galaxy—derived 
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150 pe (500 light years) 


Figure 1 | Henize 2-10. Henize 2-10 is a blue compact dwarf galaxy hosting a 
concentrated region of extreme star formation. Using Ho (ref. 8) and 24 1m 
(ref. 11) fluxes from the literature, we estimate a star-formation rate’? of 
1.9Mo yr _', assuming that all of the emission is from the starburst and that the 
contribution from the active nucleus is negligible. We estimate that Henize 2-10 
hasa stellar mass of 3.7 X 10°M 5 from the integrated 2MASS K,-band flux'*". 
Neutral hydrogen observations of Henize 2-10 indicate a solid-body rotation 
curve typical of dwarf galaxies with a maximum projected rotational velocity of 
39kms ' relative to the systemic velocity of the galaxy’. These observations 
also indicate a dynamical mass of about 10'°M 5 within 2.1 kiloparsecs (ref. 7). 
The main optical body of the galaxy, shown here, is less than one kiloparsec 
across. Henize 2-10 shows signs of having undergone an interaction, including 
tidal-tail-like features in both its gaseous’ and stellar distributions (seen here). 
In this three-colour HST image of the galaxy, we show ionized gas emission in 
red (Ha) and the stellar continuum in green (~J-band, 0.8 um) and blue (~ U- 
band, 0.3 um). These archival data were taken with Wide Field and Planetary 
Camera 2 (Ha) and the Advanced Camera for Surveys (U- and I-band). The 
white box indicates the region shown in Figs 2 and 3. 
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Figure 2 | The active nucleus and ionized gas in Henize 2-10. The overall 
morphology of the radio emission (green contours) in the central region of 
Henize 2-10 matches that of the ionized gas detected with HST (colour images), 
suggesting a shared origin. The active nucleus (plus symbol) is detected as a non- 
thermal nuclear VLA radio source coincident with a Chandra point source with 
hard X-ray emission. The nuclear source is also coincident with a local peak of 
ionized gas emission and appears to be connected to the thin quasi-linear 
structure between the two bright and extended regions of ionized gas: this is 
suggestive of, although not proof of, outflow. The central source has 4.9 GHz and 
8.5 GHz radio luminosities of 7.4 X 10° ergs | and 1.0 X 10°’ ergs |, 
respectively. The X-ray luminosity of the central source in the 2-10 keV band is 
~2.7X 10° ergs |. The accretion rate of the 2 X 10°Mo black hole is 
~5X10 °Mo yr | assuming an X-ray bolometric correction of 10 and an 
accretion efficiency of 0.1.a, Narrowband imaging with the Near Infrared Camera 
and Multi-Object Spectrometer (NICMOS) on the Hubble Space Telescope was 
used to trace the ionized gas in Henize 2-10 using the Pax hydrogen 
recombination line at 1.87 um. Continuum emission was removed using a 
neighbouring off-line narrowband filter. VLA 8.5 GHz (3.5 cm) radio contours 
are over-plotted in green and the active galactic nucleus is marked with a plus 
symbol. Contour levels are 9, 13, 17, 25, 33, 41 and 49 times the root-mean-square 
noise (12 ty per beam). The beam is shown in the lower right corner. b, Optical 
narrowband imaging of the Ha hydrogen recombination line at 0.66 1m yields a 
higher-resolution view of the ionized gas in Henize 2-10. The continuum has not 
been subtracted in this archival image, leaving young star clusters also visible. 


from observations of neutral hydrogen gas rotating as a solid body’— 
indicates that the position of the non-thermal radio source is consistent 
with the dynamical centre of the galaxy. 

Compact radio and hard X-ray emission at the centre of a galaxy are 
generally good indicators of accretion onto a massive black hole”’, but 
we have also considered alternative explanations for the data. As dis- 
cussed at length in the Supplementary Information, it is extremely 
unlikely that the nuclear source in Henize 2-10 is one or more super- 
nova remnants, more recently created supernovae, stellar-mass black- 
hole X-ray binaries, or some combination of these phenomena. Briefly, 
X-ray binaries are too weak in the radio, supernova remnants are too 
weak in hard X-rays, and young compact radio supernovae are ruled out 
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Figure 3 | Young super-star clusters in Henize 2-10. The overall structure of 
the radio emission (green contours) differs markedly from the distribution of 
star clusters in the centre of the galaxy (colour images). In particular, the non- 
thermal nuclear radio source does not have a detectable counterpart in these 
broadband continuum images (plus symbol), excluding any association with a 
visible star cluster. a, A near-infrared (~ K-band) image of the central region of 
Henize 2-10 overplotted with the same radio contours as in Fig. 2. HST/ 
NICMOS was used to observe the galaxy through a broadband filter centred at 
2.1 um, which primarily traces the distribution of stellar light. b, A higher- 
resolution view of the star clusters is shown in this archival 0.8 um (~J-band) 
broadband image. The field of view is the same as in Fig. 2. 


by observations using Very Long Baseline Interferometry~’. Although it 
may be possible to account for the radio and X-ray luminosities of the 
nuclear source with just the right combination of the abovementioned 
phenomena, the probability of such a coincidence is exceedingly low 
(see Supplementary Information). On the contrary, the radio and hard 
X-ray luminosities of the central source in Henize 2-10 are well within 
the range of known low-luminosity active galactic nuclei*’. 

In addition to ruling out young compact supernovae, the non-detec- 
tion of the nuclear radio source at higher resolution (~0.5 pc X 0.1 pc) 
using Very Long Baseline Interferometry”’ may also seemingly rule out 
the presence of an actively accreting massive black hole. However, 
Seyfert nuclei with steep radio spectra (~ < —0.5, S, x v”) often 
exhibit this ‘missing flux’ phenomenon when observed at increasingly 
higher spatial resolution™. In these active galactic nuclei, as much as 
~90% of the radio emission is absent on parsec scales and is expected 
to be dominated by extended low-surface-brightness features on larger 
scales, such as jets. This is in contrast to Seyferts with flat or positive 
radio spectra (~ 2 0) and elliptical radio galaxies in which the radio 
emission is concentrated in a compact core. The nuclear radio source 
in Henize 2-10 has a spectral index (« ~ —0.4) similar to Seyfert nuclei 
that are known to have reduced flux densities on parsec scales. 
Therefore, we do not consider the non-detection of the nuclear radio 
source at very high resolution to be incompatible with the presence of 
an active galactic nucleus in Henize 2-10. 
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We conclude that an actively accreting massive black hole is the most 
feasible explanation for the nuclear source in Henize 2-10. The compact 
radio and hard X-ray luminosities are consistent with the observed 
correlation for active galactic nuclei and we therefore estimate the mass 
of the black hole in Henize 2-10 using the so-called “fundamental plane 
of black hole activity”’*. This empirical relationship relating black-hole 
mass to the emitted compact radio and hard X-ray luminosities, span- 
ning nine orders of magnitude in black-hole mass, is given by the 
equation logLp = 0.60logLy + 0.78logM + 7.33, where Lp is the radio 
luminosity at 5 GHz in ergs ', Lx is the 2-10 keV X-ray luminosity in 
ergs 1 and M is the mass of the black hole in solar masses, M o- Using 
the observed radio luminosity of 7.4 X 10° ergs _' at 4.9 GHz and the 
X-ray luminosity of 2.7 X 10°’ ergs * in the 2-10 keV band, we cal- 
culate log(M/M.) = 6.3 + 1.1 for the black hole in Henize 2-10. The 
region in which the gravitational influence of a one-million-solar-mass 
black hole dominates that of the host galaxy subtends a very small angle 
on the sky at the distance of Henize 2-10 (<1 arcsecond for velocity 
dispersions >10kms~'). Thus, it is not surprising that kinematic 
studies of Henize 2-10 have not previously revealed the presence of 
the black hole at its centre. 

Few dwarf galaxies are currently known to host massive black 
holes**”’; however, the discovery of an active nucleus in Henize 2-10 
opens up a new realm in which to search for local analogues of primordial 
black-hole growth (that is, dwarf starburst galaxies). While recent 
searches**”” have revealed growing numbers of nuclear black holes with 
masses similar to that in Henize 2-10, the host galaxies of these objects 
have very different properties from that of Henize 2-10. Most notably, 
they are not actively forming stars and have regular morphologies of 
disks and spheroids with well-defined optical nuclei”**°. Moreover, the 
majority of the black holes detected in these systems are radiating at high 
fractions of their Eddington limits” ~’, suggesting that the black holes are 
currently undergoing rapid growth. In contrast, the low-luminosity 
active galactic nucleus in Henize 2-10 is currently radiating significantly 
below its Eddington limit (~10~* assuming an X-ray bolometric cor- 
rection of ten; see Supplementary Information), suggesting a different 
evolutionary state. 

The results presented here have broad implications for our under- 
standing of the evolution of galaxies and their central black holes. The 
concurrent black-hole growth and extreme starburst in Henize 2-10 
probably resembles the conditions in low-mass, high-redshift galaxies 
during the early phases of galaxy assembly when interactions and 
mergers were common. Indeed, Henize 2-10 shows signs of having 
undergone an interaction, including tidal-tail-like features in both its 
gaseous’ and stellar distributions (Fig. 1). Additionally, it is intriguing 
that the massive black hole in Henize2-10 does not appear to be 
associated with a bulge, a nuclear star cluster or any other well-defined 
nucleus. This unusual property may reflect an early phase of black-hole 
growth and galaxy evolution that has not been previously observed. If 
so, this implies that primordial seed black holes could have pre-dated 
their eventual dwellings, thereby constraining theories for the forma- 
tion mechanisms of massive black holes and galaxies. 
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Entanglement in a solid-state spin ensemble 
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Mike L. W. Thewalt”, Kohei M. Itoh® & John J. L. Morton!” 


Entanglement is the quintessential quantum phenomenon. It is a 
necessary ingredient in most emerging quantum technologies, 
including quantum repeaters’, quantum information processing” 
and the strongest forms of quantum cryptography’. Spin ensembles, 
such as those used in liquid-state nuclear magnetic resonance*”, have 
been important for the development of quantum control methods. 
However, these demonstrations contain no entanglement and ulti- 
mately constitute classical simulations of quantum algorithms. Here 
we report the on-demand generation of entanglement between an 
ensemble of electron and nuclear spins in isotopically engineered, 
phosphorus-doped silicon. We combined high-field (3.4T), low- 
temperature (2.9 K) electron spin resonance with hyperpolarization 
of the *'P nuclear spin to obtain an initial state of sufficient purity 
to create a non-classical, inseparable state. The state was verified 
using density matrix tomography based on geometric phase gates, 
and had a fidelity of 98% relative to the ideal state at this field and 
temperature. The entanglement operation was performed simulta- 
neously, with high fidelity, on 10'° spin pairs; this fulfils one of the 
essential requirements for a silicon-based quantum information 
processor. 

Most quantum information processing algorithms applied to spin 
ensembles have been implemented in a regime of weak spin polariza- 
tion. However, owing to the very low purity of the states used, any 
exponential enhancement offered by quantum mechanics disappears 
when the scaling of total resources is considered. Highly mixed, or 
weakly initialized, ensembles are often interpreted as the sum of a 
perfectly mixed component (given by a normalized identity matrix 
in the density matrix representation) and a small amount, ¢, of a pure 
component, po; thus, Prue =(1—e)I/d+epy, where d is the dimen- 
sionality of the state. The I component is invariant under unitary 
operations and is not directly observable by magnetic resonance, 
which produces measurements of the population differences across 
allowed electron and nuclear spin transitions. It is therefore straight- 
forward to ignore the maximally mixed component: this approach is 
called the ‘pseudo-pure approximation’”®. 

There are a number of entanglement witnesses or monotones that 
can distinguish entangled states from (classical) separable ones. A 
widely used test is the positive partial transpose (PPT) criterion, which 
is both a necessary and a sufficient test of entanglement for two 
coupled, spin-1/2 particles’*. Applying this test to the mixed state 
above, Ptrue it can be shown® that the minimum value of ¢ which 
permits the overall state to be entangled is 1/3. 

Typical values for ¢ in liquid-state nuclear magnetic resonance and 
electron spin resonance (using 10-GHz excitation at a temperature of 
5K) are ~10~-° and ~ 107”, respectively. These values are well below 
the required threshold for the PPT test. Thus, although experiments 
performed in this regime provide a valuable test bed for techniques in 
entanglement generation and detection’, the states created are only 
pseudo-entangled, and are fully separable. (A notable exception was 
the use of chemical methods to generate highly polarized hydrogen spin 
pairs’, though that is a single-shot experiment with limited scalability.) 


To overcome this limit, we require states of higher initial purity and a 
method to measure the I component of the density matrix. 

We follow a hybrid approach, using both the electron spin and the 
nuclear spin associated with a phosphorus donor in silicon. Isolated 
donors in isotopically engineered semiconductors are of particular 
interest as they possess excellent decoherence characteristics (both 
the electron and the nuclear coherence times, T>, exceed seconds'’'”), 
can be controlled with high fidelity using microwave and radio- 
frequency pulses'*"’, and are promising for integrating quantum tech- 
nologies into conventional semiconductor devices"». 

Neglecting the weak polarization of the nuclear spin, the initial state 
populations are determined by the electron spin Zeeman energy, as 
shown in Fig. 1a, where « = exp(—gypB/kgT), gis the electron g-factor, 
[tp is the Bohr magneton, kg is Boltzmann’s constant, and B and T are 
the experimental magnetic field and temperature, respectively. At a 
high magnetic field (3.4T) and low temperature (2.9 K), the donor 
electron spin is thermally polarized to ~66%; however, the 31D nuclear 
spin, with a much weaker magnetic moment, has only ~0.04% polari- 
zation. Various methods, collectively known as dynamic nuclear 
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Figure 1 | Sequences for nuclear spin hyperpolarization and entanglement 
generation for this coupled S = 1/2, I= 1/2 spin system. a, The initial state is 
at thermal equilibrium, where populations (green) are distributed according to 
the electron spin (e) polarization at this magnetic field and temperature (see 
text). A pair of applied microwave and radio-frequency m pulses move spin 
populations to favour the \T) nuclear spin (n) state. After some time, twarr > 
Tie, there is a significant majority population in state |3), or |} |) (where the 
first and second arrows indicate the nuclear and electron spins, respectively). 
Nuclear spin and cross-relaxation processes occur on timescales much longer 
than T),. b, Illustration of the 28¢i-P coupled spin system. c, Starting from the 
hyperpolarized state in a, an electron spin coherence is generated and 
transformed into the final entangled state, containing a superposition of ||) 
and |||). d, The growth in the electron spin echo intensity measured on the 
|1)-|3) transition provides a measure of the population ratio, «. a.u., arbitrary 
units. e, This hyperpolarization sequence minimizes the linear entropy of the 
two-spin state for a given value of «. 
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polarization’*”’, exist for indirectly transferring electron spin polariza- 
tion to the nuclear spin and often exploit cross-relaxation processes 
involving simultaneous electron and nuclear spin flips. Here we exploit 
the relative absence of cross-relaxation leading to a substantial differ- 
ence in the relaxation times of the electron and nuclear spins’’, to 
hyperpolarize the nuclear spin rapidly and with high efficiency. This 
hyperpolarization process is similar to ‘algorithmic cooling’ methods, 
whereby a particular quantum bit (qubit) relaxes quickly owing to 
coupling to a heat bath’. 

Figure 1 illustrates our method for tackling the twin challenges of 
measuring and minimizing the I component in the density matrix of the 
coupled electron-nuclear spin system. The hyperpolarization of the 
nuclear spin can be understood as a SWAP operation (which inter- 
changes the states of two qubits) with the (thermally polarized) electron 
spin, using a combination of resonant microwave and radio-frequency 
tt pulses. This is followed by a delay tw arr, which is substantially longer 
than the electron spin relaxation time, T), (specifically, tw arr ~ 8Tie); 
during which the electron spin relaxes back to thermal equilibrium. On 
this timescale, other relaxation processes (such as pure nuclear spin flips 
or electron-nuclear spin flip-flops) are orders of magnitude slower and 
can be neglected. The resulting hyperpolarized state is 


= a (a1) (1| +. |2) (2| +[3) (3| + 14) (4) 


where Z = 2(1 + «) is a normalizing constant. 

Although spin echo sequences can only be used to probe the popu- 
lation differences across energy levels, we can obtain a direct measure 
of the population ratio, ~, by measuring the electron spin echo ampli- 
tude between levels |1) and |3) before and after the hyperpolarization 
sequence, as shown in Fig. 1d. Owing to the enhanced polarization of 
the nuclear spin, a spin echo measured on this transition increases by a 
factor of 2/(~% + 1) in comparison with the measurement from a fully 
relaxed thermal state. This measure is strictly conservative: it places a 
lower bound on the true polarization of the electron, as imperfections 
such as pulse errors or residual relaxation processes only lead to a 
lower apparent state purity. Using this measure, we observe an 
enhancement of the echo intensity by a factor of 1.643(2), correspond- 
ing to an upper bound of « = 0.217(2). 

Linear spin entropy (defined as V[1 —Tr(p*)]/(WV — 1) for an NV - 
dimensional Hilbert space) is a useful characterization of a state’s 
purity, and ranges from one, for maximally mixed states, to zero, for 
pure states. Our hyperpolarization sequence corresponds to a decrease 
in linear spin entropy, made possible by the open quantum system’s 
contact with the lattice heat bath (Fig. le). Importantly, this approach 
leads to the minimum possible linear entropy given the electron spin 
polarization resource and type of relaxation present’*. Entanglement is 
maximized in a mixed, two-qubit density matrix by first minimizing 
the linear entropy and then generating an entangled coherence across 
the levels with the largest and second-smallest populations'””°. Following 
this strategy, we create an entangled state using a coherence-generating 
microwave m'”/2 pulse (where the superscript denotes the pair of levels 
addressed by the pulse) followed by a radio-frequency n** pulse (Fig. 1c), 
yielding the target state: 


l+a 0 0 l1-a 
1 0 20 0 0 
P“2R 1 0 0 2m 0 
l-a« 0 O Il+a 


This density matrix is entangled according to the PPT criterion when 
a = 0.432; other preparation methods (such as pseudo-pure state pre- 
paration) require substantially higher polarization (Supplementary 
Information). 

Having prepared the initial state and performed an entangling 
operation, we now use density matrix tomography to extract the final 
two-spin state. Owing to the weak magnetic moment of nuclear spins 


70 | NATURE | VOL 470 | 3 FEBRUARY 2011 


and necessarily low donor concentration in our sample, we are 
restricted to non-projective measurements of the electron spin 
ensemble along the o, and oy Pauli bases, which can be performed 
selectively on the m, state of the nuclear spin (in product operator 
formalism, these bases can be written as Sx I” BY 

Diagonal elements of the density matrix (corresponding to state 
populations) are obtained by mapping pairs of population differences 
into an electron spin echo on the |1)-|3) transition (S,., J”). The accur- 
ate detection of off-diagonal elements (coherences) is a more elaborate 
process, made by selectively labelling the coherence between each pair 
of eigenstates with a distinguishable, time-varying phase’. By this pro- 
cess, a particular phase accumulation rate provides the signature of a 
particular coherence, allowing the off-diagonal elements to be recon- 
structed from the amplitudes in the Fourier transform of a measured 
signal. 

Here we follow an approach inspired by the Aharonov-Anandan 
geometric phase gate*’”’ to apply arbitrary phases in a fixed time to the 
four different eigenstates, and thus separately label each of the possible 
coherences. We apply two 1 pulses, along different axes, across a 
transition between a pair of eigenstates. The phase acquired by each 
eigenstate is opposite and equal to half the solid angle of its trajectory 
on the Bloch sphere (Fig. 2a). Thus, applying 2)” followed by — m7, 
(subscripts denote pulse phase and, thus, nominal rotation axis) leads 
eigenstates |1) and |3) to assume trajectories of equal and opposite 
solid angle, +2¢. A similar operation, To followed by — ne, is 
applied to the nuclear spin transition, such that the total operator 
describing the action of these four pulses is 


e * 9 0 0 
0 1 0 0 
MOO=) 4 9 got) 9 


The value of ¢ is incremented by d¢ on each shot of the experiment, 
with effective frequency vy = 2n/d (and similarly for o, do and v,). 
We then map each off-diagonal element of the density matrix in turn 
into S,.,J” using a set of appropriate microwave and radio-frequency x 
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Figure 2 | Electron and nuclear spin phase rotations reveal the off-diagonal 
elements of the density matrix. a, Under the application of two consecutive 
r* pulses around different axes (#), the eigenstates |1) and |3) undergo closed 
trajectories on the Bloch sphere with equal and opposite solid angles, Q = +29. 
Each state picks up a phase equal to half this solid angle. b, This to, —™y phase 
gate is applied to both electron |1)-|3) and nuclear |3)-|4) transitions, where 
the two phases are varied by different increments, 5¢ and da, as the experiment 
is repeated. Example oscillations are shown for three experiments where we 
generate an electron coherence, |1)(3 |, a nuclear coherence, 4)(3| and a zero 
quantum coherence, |2){3|. c, Fourier transforms (FT) of the oscillations with 
respect to increment number show peaks located at the frequencies 0.050(8), 
0.031(5) and —0.079(8), in agreement with the frequencies that were set, 

Vg = 2n/d = 0.05 and v, = 2n/d0 = 0.03. 
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pulses, and measure the amplitude of the Fourier component at the 
effective frequency corresponding to that coherence. Quadrature mea- 
surement allows us to discriminate between positive and negative 
frequencies. The presence of other Fourier peaks would be illustrative 
of pulse errors in the mapping sequence, but as seen in Fig. 2b, c, such 
errors are negligible even in the absence of operations such as phase 
cycling. 

By combining our measurements of the identity component and the 
diagonal and off-diagonal elements of the density matrix of the electron- 
nuclear spin system, we obtain the following expression for p: 


0.382 0.003+0.000i —0.035—0.039i 0.272 
0.003 — 0.0007 0.017 —0.000+0.001i 0.001+0.003i 
—0.035+0.039i —0.000—0.001i 0.174 — 0.055 —0.042i 
0.272 0.001—0.003i —0.055+0.042i 0.427 


This state has a minimum eigenvalue under the PPT test of 
—0.19(1) and a concurrence, C, of 0.43(4), each of which confirms 
the presence of finite entanglement. The results of this tomography 
process are shown in Fig. 3. The fidelity of the measured density matrix 
with respect to the target state, given that « = 0.217, is 98.2(2)%, and is 
68(2)% with respect to an ideal Bell state (~ = 0). To obtain the uncer- 
tainty in these values, we used Monte Carlo generation of physical 
density matrices based on the standard error of each matrix element 
due to noise (Supplementary Information). 

The finite entanglement shown can offer direct advantages over 
classical methods in applications such as quantum sensors”. To 
achieve higher-purity entangled states, we could use lower tempera- 
tures; for example, we would expect C ~ 0.99 if these experiments were 
performed at 0.8K. Complementary to this approach, entanglement 
purification could be performed using a larger Hilbert space at each 
node™, for example using a donor atom with a higher nuclear spin 
(such as bismuth, with I = 9/2). 

The electron-nuclear spin entanglement generated here could also 
be mapped into an entangled state between nuclear spin pairs’. By 
interchanging (by SWAP) the state of the electron spin with a second, 
coupled nucleus, for example, nuclear spin entanglement could be 
attained in a regime where the thermal polarization of the nuclei would 
be orders of magnitude too small and the direct coupling between 
them weak. Clusters of up to eight nuclei coupled to a single electron 
spin have been explored in other materials’, although the scaling of 
such an approach seems limited. A scalable network of entangled 
nuclear spins could be generated by exploiting the ability to ionize 
the donor and transfer the electron onto a neighbouring donor site’””*. 
These operations, combined with single-shot read-out of the phosphorus 
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Figure 3 | Measuring an entangled density matrix. a, The full pulse sequence 
used to prepare, entangle and measure the two-spin state. The final read-out 
stage was changed according to the density matrix element being measured: 
examples are shown for the |1){2| and |1)(4| states. b, The obtained density 
matrix is shown as solid bars, and the dashed outline (zero where not shown) 
shows that of an ideal state given ~ = 0.217. The fidelity of the ideal state with 
the measured density matrix is 98%. 
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donor spin” and globally controlled electron—-nuclear spin entangle- 
ment such as we have demonstrated, form the basis for a cluster-state 
quantum computer in silicon’. 


METHODS SUMMARY 


Si:P consists of an electron spin, S = 1/2 (g = 1.9987), coupled to the nuclear spin, 
I=1/2, of *'P through an isotropic hyperfine coupling of a = 4.19mT. The 
W-band electron spin resonance signal comprises two lines (one for each nuclear 
spin projection, M; = +1/2). Our experiments were performed on the low-field 
line of the electron spin resonance doublet, corresponding to M; = 1/2. At 2.9K 
and 3.36 T, the electron and nuclear spin relaxation times were measured to be 
approximately 0.6s and 100s, respectively. 

The sample consists of a **Si-enriched single crystal about 0.5 mm in diameter with 
a residual *’Si concentration of order 70 p.p.m., produced by decomposing isotopi- 
cally enriched silane in a recirculating reactor to produce poly-silicon rods, followed 
by floating-zone crystallization. Phosphorus doping of ~10'* cm” * was achieved by 
adding dilute PH; gas to the ambient argon during the final floating-zone single- 
crystal growth. Further information on the sample growth has been reported 
elsewhere”. 

Pulsed electron spin resonance experiments were performed using a W-band 
(94-GHz) Bruker ELEXSYS 680 spectrometer, modified to allow microwave phase 
control and equipped with a 6-T superconducting magnet and a low-temperature 
helium-flow cryostat (Oxford CF935). The cryostat was pumped to achieve a 
temperature of 2.88 K (internal thermocouple) consistent with the spin temper- 
ature measurement (see text). Typical pulse times were 56 ns for a microwave 7 
pulse and 100 ps for a radio-frequency nt pulse. To achieve arbitrary phase control, 
we generated radio-frequency pulses using a Rohde and Schwarz AFQ100B 
together with an Amplifier Research 500 W amplifier. 
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Femtosecond X-ray protein nanocrystallography 
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X-ray crystallography provides the vast majority of macromolecular 
structures, but the success of the method relies on growing crystals of 
sufficient size. In conventional measurements, the necessary increase 
in X-ray dose to record data from crystals that are too small leads to 
extensive damage before a diffraction signal can be recorded’. It is 
particularly challenging to obtain large, well-diffracting crystals of 
membrane proteins, for which fewer than 300 unique structures have 
been determined despite their importance in all living cells. Here we 
present a method for structure determination where single-crystal 
X-ray diffraction ‘snapshots’ are collected from a fully hydrated 
stream of nanocrystals using femtosecond pulses from a hard-X- 
ray free-electron laser, the Linac Coherent Light Source*. We prove 
this concept with nanocrystals of photosystem I, one of the largest 
membrane protein complexes’. More than 3,000,000 diffraction 
patterns were collected in this study, and a three-dimensional data 
set was assembled from individual photosystem I nanocrystals 
(~200 nm to 2pm in size). We mitigate the problem of radiation 
damage in crystallography by using pulses briefer than the timescale 
of most damage processes. This offers a new approach to structure 
determination of macromolecules that do not yield crystals of suf- 
ficient size for studies using conventional radiation sources or are 
particularly sensitive to radiation damage. 

Radiation damage has always limited resolution in biological 
imaging using electrons or X-rays’. With the recent invention of the 
femtosecond X-ray laser, an opportunity has arisen to break the nexus 
between radiation dose and spatial resolution. It has been proposed 
that femtosecond X-ray pulses can be used to outrun even the fastest 
damage processes by using single pulses so brief that they terminate 
before the manifestation of damage to the sample®. Experiments at the 
FLASH free-electron laser (FEL), Germany, confirmed the feasibility of 
‘diffraction before destruction’ at resolution lengths down to 60 A on 
test samples fixed on silicon nitride membranes’. It was predicted that 


the irradiance (or power density) of focused pulses from a hard-X-ray 
FEL such as the Linac Coherent Light Source (LCLS), USA, would be 
sufficient to produce diffraction patterns at near-atomic resolution’. 

We demonstrate here that this notion of diffraction before destruc- 
tion operates at subnanometre resolution, using the membrane protein 
photosystem I as a model system, and establish an approach to structure 
determination based on X-ray diffraction data from a stream of nano- 
crystals**. Membrane proteins have a central role in the functioning of 
cells and viruses, yet our knowledge of the structure and dynamics 
responsible for their functioning remains limited. Photosystem I is a 
large membrane protein complex (1-MDa molecular mass, 36 proteins, 
381 cofactors) that acts as a biosolar energy converter in the process of 
oxygenic photosynthesis. Its crystals display the symmetry of space 
group P63, with unit-cell parameters a = b = 281 A and c= 165 A, 
and consist of 78% solvent by volume. We show that diffraction data 
can be recorded from these fragile protein nanocrystals before destruc- 
tion occurs. Furthermore, we demonstrate that structure factors can be 
extracted from the ‘partial’ reflections of tens of thousands of single- 
crystal diffraction snapshots, showing that interpretable high-quality, 
three-dimensional (3D) structure factor data can be obtained from a 
suspension of submicrometre crystals. 

Our experimental set-up (Fig. 1 and Methods) records single-crystal 
diffraction data from a stream of crystals carried in a 4-11m-diameter, 
continuous liquid water jet” that flows across the focused LCLS X-ray 
beam in vacuum at 10 pl min~’. In contrast to cryo-electron micro- 
scopy" or standard crystallography on microcrystals’, which require 
cryogenic cooling, these data were collected on fully hydrated, 3D 
nanocrystals. The crystal located in the interaction region when an 
X-ray pulse arrives gives rise to a diffraction pattern that is detected 
on a set of two low-noise, X-ray p-n junction charge-coupled device 
(pnCCD) modules” and read out before the arrival of the next pulse at 
the FEL repetition rate of 30 Hz, or 1,800 patterns per minute. The 
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Figure 1 | Femtosecond nanocrystallography. Nanocrystals flow in their 
buffer solution in a gas-focused, 4-j1m-diameter jet at a velocity of 10ms_ 
perpendicular to the pulsed X-ray FEL beam that is focused on the jet. Inset, 
environmental scanning electron micrograph of the nozzle, flowing jet and 
focusing gas*’. Two pairs of high-frame-rate pnCCD detectors’” record low- 
and high-angle diffraction from single X-ray FEL pulses, at the FEL repetition 
rate of 30 Hz. Crystals arrive at random times and orientations in the beam, and 
the probability of hitting one is proportional to the crystal concentration. 


1 


photon energy of the X-ray pulses was 1.8 keV (6.9-A wavelength), with 
more than 10'” photons per pulse at the sample and pulse durations of 
10, 70, and 200 fs (ref. 13). An X-ray fluence of 900 J cm 7 was achieved 
by focusing the FEL beam to a full-width at half-maximum of 7 um, 
corresponding to a sample dose of up to 700 MGy per pulse (calculated 


using the program RADDOSE”) and a peak power density in excess of 
10'°W cm “” at 70-fs duration. In contrast, the typical tolerable dose in 
conventional X-ray experiments is only about 30 MGy (ref. 1). A single 
LCLS X-ray pulse destroys any solid material placed in this focus, but 
the stream replenishes the vaporized sample before the next pulse. 

The front detector module, located close to the interaction region, 
recorded high-angle diffraction to a resolution of 8.5 A, whereas the 
rear module intersected diffraction at resolutions in the range of 4,000 
to 100 A. We observed diffraction from crystals smaller than ten unit 
cells on a side, as determined by examining the data recorded on the 
rear pnCCDs (Fig. 2). A crystal with a side length of N unit cells gives 
rise to diffraction features that are finer by a factor of 1/N than the 
Bragg spacing (that is, with N — 2 fringes between neighbouring Bragg 
peaks), providing a simple way to determine the projected size of the 
nanocrystal. Images of crystal shapes obtained using an iterative phase 
retrieval method’*"* are shown in Fig. 2. The 3D Fourier transform of 
the crystal shape is repeated on every reciprocal lattice point. However, 
the diffraction condition for lattice points is usually not exactly satisfied, 
so each recorded Bragg spot represents a particular ‘slice’ of the Ewald 
sphere through the shape transform, giving a variety of Bragg spot 
profiles in a pattern; these are apparent in Fig. 2. The sum of counts 
in each Bragg spot underestimates the underlying structure factor 
square modulus, representing a partial reflection. 

Figure 3a shows strong single-crystal diffraction to the highest 
angles of the front detector. The nanocrystal shape transform is also 
apparent in many patterns at the high angles detected by the front 
detector, giving significant measured intensities between Bragg peaks 
as is noticeable in Supplementary Fig. 3a. These mid-Bragg intensities 


Figure 2 | Coherent crystal diffraction. Low-angle diffraction patterns 
recorded on the rear pnCCDs, revealing coherent diffraction from the structure 
of the photosystem I nanocrystals, shown using a logarithmic, false-colour 
scale. The Miller indices of the peaks in a were identified from the 
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corresponding high-angle pattern. In c we count seven fringes in the b* 
direction, corresponding to nine unit cells, or 250 nm. Insets, real-space images 
of the nanocrystal, determined by phase retrieval (using the Shrinkwrap 
algorithm’) of the circled coherent Bragg shape transform. 
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Figure 3 | Diffraction intensities and electron density of photosystem I. 
a, Diffraction pattern recorded on the front pnCCDs with a single 70-fs pulse 
after background subtraction and correction of saturated pixels. Some peaks are 
labelled with their Miller indices. The resolution in the lower detector corner is 
8.5 A. b, Precession-style pattern of the [001] zone for photosystem I, obtained 
from merging femtosecond nanocrystal data from over 15,000 nanocrystal 


oversample the molecular transform, providing a potential route to 
phasing of the pattern’””*. 

In conventional crystallography, the ‘full’ Bragg reflection is deter- 
mined to high precision, for example by integrating counts as the 
crystal is rotated such that these reflections pass through the diffrac- 
tion condition. By indexing individual patterns and then summing 
counts in all partial reflections for each index, we performed a 
Monte Carlo integration over the reciprocal-space volume of the 
Bragg reflection and the distribution of crystal shapes and orientations 
and variations in the X-ray pulse fluence. The result of this procedure 
converges to the square of the structure factor moduli’*. We found that 
over 13% of diffraction patterns with ten or more spots could be 
consistently indexed using the programs MOSFLM” and DirAx’’ 
(Methods). Merged intensities at 70-fs pulse duration are presented 
as a precession-style image of the [001]-zone axis in Fig. 3b (see also 
Supplementary Figs 3 and 4). We tested the reliability of this approach 
by comparing the LCLS merged data with data collected at 100 K with 
12.4-keV synchrotron radiation from a single crystal of photosystem I 
cryopreserved in 2 M sucrose. These data sets show good agreement, 
with a difference metric, Rj,., of 22.1% computed over the entire reso- 
lution range and of less than 13% in the middle resolution shells; see 
Supplementary Table 1 for detailed statistics. 

To complete our proof of principle, we conducted a rigid-body 
refinement of the published photosystem I structure (Protein Data 
Bank ID, 1JBO) against the nanocrystal structure factors, yielding 
R/Réree = 0.25/0.23. A representative region of the 2mF, — DF, elec- 
tron density map at 8.5 A (Methods) from the LCLS data set is shown 
in Fig. 3c. This map shows the details expected at this resolution, 
including transmembrane helices, membrane extrinsic features and 
some loop structures. For comparison, the electron density refined 
from the 12.4-keV, single-crystal data set truncated to a resolution of 
8.5 A is given in Fig. 3d. 

The dose of 700 MGy corresponds to a K-shell photoabsorption of 
3% of all carbon atoms in the protein. This energy is subsequently 
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patterns, displayed on the linear colour scale shown on the right. c, d, Region of 
the 2mF, — DF, electron density map at 1.0¢ (purple mesh), calculated from 
the 70-fs data (c) and from conventional synchrotron data truncated at a 
resolution of 8.5 A and collected at a temperature of 100 K (d) (Methods). The 
refined model is depicted in yellow. 


released by photoionization and Auger decay, followed by a cascade 
of lower-energy electrons caused by secondary ionizations, taking 
place on the 10-100-fs timescale*’. Using a model of the plasma 
dynamics**”’, we calculated that by the end of a 100-fs pulse each atom 
of the crystal was ionized once, on average, and that motion of nuclei 
had begun. This is expected to give rise to a decrease in Bragg ampli- 
tudes, similar to an increase in a Debye-Waller temperature factor™*. 
We studied the effects of the initial ionization damage on the diffrac- 
tion of photosystem I nanocrystals by collecting a series of data sets at 
pulse durations of 10, 70 and 200 fs. The 10-fs pulses were produced 
with lower pulse energy: ~ 10% of the total number of photons of the 
longer pulses’, or a 70-MGy dose. Plots of the scattering strength of 
the crystals versus resolution, generated by selecting and summing 
Bragg spots from more than 66,000 patterns for each of the three pulse 
durations measured, are shown in Fig. 4. The 10- and 70-fs traces are 
very similar, indicating that these pulses are short enough to overcome 
radiation damage at the observed resolution, 8.5 A. For 200-fs pulses, 
there is a decrease in scattering strength at resolutions beyond 25 A, 
indicating disordering on this longer timescale. The highest-resolution 
Bragg peaks for the 200-fs pulses were not broadened or shifted relative 
to the short-duration data sets, which indicates there was no strain or 
expansion of the lattice, respectively. 

Our next step is to improve resolution by using shorter-wavelength 
X-rays. Resolution may ultimately be limited by X-ray pulse fluence, 
the ultrafast radiation damage and the intrinsic disorder within the 
nanocrystals themselves. Recent experiments” at LCLS indicate a brief 
saturation of the X-ray photoabsorption of atoms in a tightly focused 
pulse, resulting in a decrease in photoionization damage on a 20-fs 
timescale without a reduction in the scattering cross-sections that give 
rise to the diffraction pattern”. Planned beamlines at LCLS aim to 
achieve up to a 10°-fold increase in pulse irradiance by tighter focusing, 
allowing data collection with low-fluence, 10-fs pulses or pulses of even 
shorter duration”. This provides a route to further reducing radiation 
damage and may allow measurements on even smaller nanocrystals, 
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Figure 4 | Pulse-duration dependence of diffraction intensities. Plot of the 
integrated Bragg intensities of photosystem I nanocrystal diffraction as a 
function of photon momentum transfer, q = (47/)sin(0) = 2n/d (wavelength, 
A; scattering angle, 20; resolution, d) for pulse durations of 10, 70 and 200 fs. 
Averages were obtained by isolating Bragg spots from 97,883, 805,311 and 
66,063 patterns, respectively, normalized to pulse fluence. The error in each 
plot is indicated by the thickness of the line. The decrease in irradiance for 
200-fs pulses and d< 25 A indicates radiation damage for these long pulses, 
which is not apparent for 70-fs pulses and shorter. 


down to a single unit cell® (that is, a single molecule). As this limit is 
approached, the ordering of the nanocrystals will become increasingly 
irrelevant, as each crystal may be treated as a single object and the 
‘disorder’ that conventionally leads to reduced resolution will simply 
manifest itself as shot-to-shot variability, providing information about 
not just the average structure but also the range of dynamically accessible 
conformations. 

Data are collected on fully hydrated nanocrystals without cryogenic 
cooling. We expect that the results presented here will open new avenues 
for crystallography using X-ray laser pulses that are so short that only 
negligible X-ray-induced radiation damage occurs during data collec- 
tion. Significant improvements in sample utilization are expected by 
exploiting higher X-ray repetition rates or by slowing the liquid flow. 
For example, the generation, using inkjet technologies, of liquid droplets 
at a rate that matches the LCLS X-ray pulses would dramatically decrease 
the total required sample volume by a factor of 25,000, meaning that less 
than 0.4 ll of nanocrystal suspension would be needed in our particu- 
lar case, of photosystem I. Further efficiency gains would result from 
indexing and merging a greater proportion of patterns into the 3D 
data set, which may be achieved by applying methods for merging 
continuous diffraction patterns of single molecules**”’ or by using 
‘post-refinement”® to obtain accurate structure factor estimates from 
fewer diffraction patterns. These methods will also remove the twinning 
ambiguity that exists in our current indexing scheme. Our method also 
has potential application to the study of chemical reactions, such as the 
processes in photosynthesis or enzymatic reactions. 


METHODS SUMMARY 


We made our measurements using the CFEL-ASG Multi-Purpose (CAMP) 
instrument’? on the Atomic, Molecular and Optical Science beamline”’ at the 
LCLS*. Diffraction data were recorded at the LCLS repetition rate of 30 Hz with 
a set of two movable, high-frame-rate, low-noise, X-ray pnCCD detector units”. 
The front detector, located 68 mm from the jet, accepts scattering angles up to 
47.9°, corresponding to a resolution of 8.5 A at a wavelength of 6.9 A. The rear unit 
was located 564 mm from the jet to record finer sampling of the diffraction pattern 
at low angles. 

The liquid jet was emitted from a capillary with an inner diameter of 40 um and 
focused by a coaxial flow of gas to a diameter of about 4 1m (ref. 9), flowing at 
10 pl min” '. The low jet diameter constrains the crystals to pass through the most 
intense part of the focused X-ray beam. Clogging of nanocrystals in the capillary is 
avoided, and the coaxial gas sheath prevents freezing of the liquid in the vacuum 
environment. A micropore filter in the fluid delivery line was used to restrict the 
size of the photosystem I nanocrystals to less than 2 1m. The suspension was 
diluted to observe a crystal ‘hit rate’ of 20% (Supplementary Fig. 2) to reduce 
the occurrence of double hits. The concentration of observed crystals was therefore 
0.2 per illuminated volume of 4 X 4 X 13 um’, or about 10? crystals per millilitre. 


76 | NATURE | VOL 470 | 3 FEBRUARY 2011 


The overall protein concentration after dilution of the suspension was 1 mg ml ' 
(1 uM of the photosystem I trimer), and a complete set of structure factors was 
obtained from 1,850,000 X-ray pulses. 

Diffraction peaks from the 70-fs data were identified, indexed and combined 
into a set of 3D structure factors comprising 3,379 unique reflections from 
2,424,394 spots. Statistics of the merged data are given in Supplementary Table 1. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Experimental set-up. The experiments were performed at LCLS*, at SLAC, at the 
AMO beamline” in vacuo using the CAMP end station'’. X-ray pulses, generated 
at a repetition rate of 30 Hz, were focused to a spot with a full-width at half- 
maximum of 7 tm (full-width of 13 j1m at 10% maximum irradiance) and a pulse 
fluence of 900Jcm™’, corresponding to a peak power density (irradiance) in 
excess of 10'°Wcm * at 70-fs duration. The pnCCD detectors were read out, 
digitized and stored at the 30-Hz rate of the delivered LCLS pulses. Each detector 
panel consists of 512 X 1,024 pixels 75 x 75 lum? in area. The rear detectors, 
located 564mm from the jet, record low-angle scattering from 0.1° to 4.0° in 
the vertical scattering plane, and the front detectors, located 68mm from the 
jet, cover 4.6° to 40.5° in the same vertical plane. The largest scattering-angle 
magnitude accepted by the front detector was 47.9°, corresponding to a resolution, 
d, of 8.5A at a wavelength of 6.9 A. X-ray fluorescence from the water jet was 
filtered by an 8-t1m-thick polyimide film in front of the pnCCDs. 

A liquid microjet*” was used to inject the nanocrystal suspension into the FEL 
beam at a flow rate of 10,11min '. The microjet was emitted from a 40-m- 
diameter capillary and focused to a 4-um-diameter column by a coaxial flow of 
helium. The X-ray attenuation in the water was at most 30%. The interaction 
region of the X-rays and crystals is located in the continuous liquid column, before 
the Rayleigh break-up of the jet into drops, such that most of the X-ray scattering 
from the liquid is confined to a narrow vertical streak in reciprocal space. 

Crystallization conditions of photosystem I nanocrystals were established by 
determining the phase diagrams*'*’. Nanocrystals were grown in batches at 10 mg 
ml ' protein concentration (30 uM P700, or 10 uM photosystem I trimer) and low 
ionic strength (8 mM MgSO,, 5 mM MES, pH 6.4, and 0.02% B-dodecylmaltoside) 
at 4°C. The photosystem I nanocrystals were then suspended in harvesting buffer 
(5mM MES, pH 6.4, and 0.02% B-dodecylmaltoside) to establish a protein con- 
centration of 1 mg ml’. The crystal suspension was filtered through 2-,um cut-off 
filters (In-line Filter, Upchurch) and stored at 4°C until use in the experiment. 

The nanocrystals are needles of hexagonal cross-section, with the long axis of 
the needle along the c axis and an aspect ratio (length to maximum diameter of 
hexagon) ranging from 1:1 to 2:1, as determined from reconstructing single-shot 
views of the whole crystal from their shape transforms. For example, Fig. 2a shows 
a view of the crystal almost perpendicular to the c axis, where we reconstruct a 
shape of aspect ratio 1.6:1. A view along the c axis (Fig. 2c) shows the hexagonal 
profile. Large, millimetre-sized, crystals of photosystem I have an aspect ratio of up 
to 5:1, which is seen to decrease with decreasing crystal size. 

The nanocrystal suspension was introduced directly into the microjet through a 
sample loop (Supplementary Fig. 1). A micropore filter in the fluid delivery line was 
used to restrict the size of the nanocrystals to less than 2 um. The suspension was 
diluted to observe a crystal ‘hit rate’ of 20% (Supplementary Fig. 2), to minimize the 
occurrence of double hits. The observed concentration of crystals was therefore 0.2 
per illuminated volume of 4 X 4 X 13 tum’, or 10° crystals per millilitre. The overall 
photosystem I protein concentration after dilution was 1 mg ml’, and a complete 
set of structure factors was obtained from 1,850,000 X-ray pulses, or 10 mg of 
protein. With the current set-up, at the 30-Hz X-ray pulse rate less than 0.004% 
of the continuously flowing solution was exposed to the X-ray beam, so only one in 
25,000 nanocrystals was actually hit by an X-ray pulse. 

Details of the acquisition of diffraction patterns and the primary data reduction 
are given in Supplementary Methods. 


3D merging of intensities. Peaks in the processed patterns were located in each 
pattern using the algorithm of ref. 33, and their locations were mapped into three 
dimensions according to the curvature of the Ewald sphere, the calibrated detector 
geometry and the X-ray wavelength. The 3D peak locations for each pattern in 
turn were presented to the auto-indexing program DirAx”. If DirAx succeeded in 
finding a unit cell for the peaks, linear combinations of the cell basis vectors were 
checked for correspondence with the photosystem I unit cell’ from the Protein 
Data Bank (ID, 1JBO). Ifa match was found, pixel intensities were summed within 
a circle of ten-pixel radius centred on the pixel closest to each located Bragg 
condition. Patterns were rejected if fewer than 10% of the detected peaks were 
accounted for by unit-cell parameters from DirAx. From 1,850,000 recorded 
patterns, we identified 112,725 as hits (more than ten detected peaks) and 
15,445 were successfully indexed. New peak-finding and -indexing algorithms 
are under development and are expected to increase significantly the number of 
patterns that can be indexed, thereby further reducing the number of protein 
crystals required for a useful data set. The variation of pixel solid angle across 
the detector plane was accounted for, as was polarization of the X-ray beam 
assuming complete horizontal polarization. A list of reflection indices and intensities 
was produced for each individual diffraction pattern, and merging was performed by 
taking the mean value for the intensity of each unique reflection. Because the index- 
ing algorithm makes use of the positions of the peaks but not their intensities, it was 
unable to distinguish between crystal orientations related by the symmetry of the 
lattice. As the symmetry of the lattice is higher than that of the actual structure of 
photosystem I, an ambiguity exists in that each pattern could correspond to one of 
two possible orientations. For programmatic convenience, these data (with 
actual space group symmetry P63) were merged as P6322 and treated as though 
merohedrally twinned during refinement (see below). A 3D rendering of the final 
full data set is shown in Supplementary Fig. 4. 

Data quality. Metrics of the merged data quality are shown in Supplementary 
Table 1 and discussed in Supplementary Information. We carried out a rigid-body 
refinement of the published photosystem I structure (Protein Data Bank ID, 1JB0) 
to the merged structure factors using the program REFMAC* in twin mode. The 
refinement R and Rfree values were 0.25 and 0.23, respectively. The 2mF, — DF, 
electron density map*’ at a resolution of 8.5 A is shown in Fig. 3c. The corres- 
ponding 2mF, — DF, electron density map from the conventional synchrotron 
data, truncated to a resolution of 8.5 A, is shown in Fig. 3d. The electron density 
maps show the large subunits PsaA and PsaB, as well as the membrane extrinsic 
subunits. The transmembrane helices, and even some loop structures, are clearly 
visible. In these figures, the ribbon representation of the protein model is shown in 
yellow and the atoms of three iron-sulphur clusters are depicted in red. 
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X-ray lasers offer new capabilities in understanding the structure of 
biological systems, complex materials and matter under extreme 
conditions’ *. Very short and extremely bright, coherent X-ray pulses 
can be used to outrun key damage processes and obtain a single 
diffraction pattern from a large macromolecule, a virus or a cell 
before the sample explodes and turns into plasma’. The continuous 
diffraction pattern of non-crystalline objects permits oversampling 
and direct phase retrieval’. Here we show that high-quality diffrac- 
tion data can be obtained with a single X-ray pulse from a non- 
crystalline biological sample, a single mimivirus particle, which 
was injected into the pulsed beam of a hard-X-ray free-electron laser, 
the Linac Coherent Light Source’. Calculations indicate that the 
energy deposited into the virus by the pulse heated the particle to 
over 100,000 K after the pulse had left the sample. The reconstructed 
exit wavefront (image) yielded 32-nm full-period resolution in a 
single exposure and showed no measurable damage. The reconstruc- 
tion indicates inhomogeneous arrangement of dense material inside 
the virion. We expect that significantly higher resolutions will be 
achieved in such experiments with shorter and brighter photon 
pulses focused to a smaller area. The resolution in such experiments 
can be further extended for samples available in multiple identical 
copies. 

Diffraction studies of crystalline samples have led to spectacular 
breakthroughs in physics, chemistry and biology over the past hundred 
years. Many important targets are difficult or impossible to crystallize, 
and this creates systematic blank areas in the structural sciences. X-ray 
lasers offer the possibility of stepping beyond X-ray crystallography, to 
extend structural studies to single, non-crystalline particles or mol- 
ecules’. In this Letter, we present results on biological imaging with 


an X-ray free-electron laser, and bring together all the elements 
required for structural studies of single, non-crystalline objects. 

Mimivirus (Acanthamoeba polyphaga mimivirus) is the largest 
known virus*. Its size is comparable to the size of the smallest living cells 
(in fact, the name mimivirus stands for ‘microbe-mimicking virus’). The 
viral capsid (0.45 um in diameter) has a pseudo-icosahedral appearance 
and is covered by an outer layer of dense fibrils”*. The total diameter of 
the particle, including fibrils, is about 0.75 |1m. Mimivirus is too big for a 
full three-dimensional reconstruction by cryo-electron microscopy’ and 
its fibrils prevent crystallization. The genome’ has 1.2 million base pairs 
(comparable to a small bacterium) and contains several genes previously 
thought to be present only in cellular organisms, including components 
of the protein translation apparatus. Mimivirus can be infected by a 
smaller virus, named a ‘virophage’®, which seems to be the first example 
of a virus behaving as a parasite of another virus®. Studies of mimivirus 
are causing a paradigm shift in virology and have led to renewed debates 
about the origin and the definition of viral and cellular life’. 

Figure 1 shows the experimental arrangement for imaging single 
virus particles. The sample injector, which uses aerodynamic focusing, 
was mounted into the CFEL-ASG Multi-Purpose (CAMP) instru- 
ment” on the Atomic, Molecular and Optical Science (AMO) beam- 
line’ at the Linac Coherent Light Source’ (LCLS). We recorded far-field 
diffraction patterns at a reduced pressure (10 °mbar) to minimize 
background scattering. Mimivirus was aerosolized from a volatile buffer 
(250 mM ammonium acetate, pH 7.5) using a gas dynamic nebulizer'* 
ina helium atmosphere. The beam of adiabatically cooled virus particles 
was guided through an aerodynamic lens stack (similar to the one 
described in ref. 15) and entered the interaction zone with an estimated 
velocity of 60-100 ms '. The particles were intercepted randomly by 
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Aerosol sample injector 


LCLS X-ray pulses Detector assembly 


Figure 1 | The experimental arrangement. Mimivirus particles were injected 
into the pulse train of the LCLS at the AMO experimental station’* with a 
sample injector built in Uppsala. The injector was mounted into the CAMP 
instrument’*. The aerodynamic lens stack is visible in the centre of the injector 
body, on the left. Particles leaving the injector enter the vacuum chamber and 
are intercepted randomly by the LCLS pulses. The far-field diffraction pattern 
of each particle hit by an X-ray pulse is recorded on a pair of fast p—n junction 
charge-coupled device (pnCCD) detectors'’. The intense, direct beam passes 
through an opening in the centre of the detector assembly and is absorbed 
harmlessly behind the sensitive detectors. Some of the low-resolution data also 
go through this gap and are lost in the current set-up. 


the LCLS pulses. The X-ray energy was 1.80 keV (6.9-A wavelength) 
and the pulse length was 70 fs (full-duration at half-maximum). The 
X-ray beam diameter at the interaction point was about 10 jm (full- 
width at half-maximum), with a maximum of 1.6 X 10° photons per 
square micrometre in the centre of this beam. This translates to a peak 
power density of 6.5 10'°Wcm *. Forward-scattered diffraction 
patterns were recorded ona pair of pnCCD detectors’*. The direct beam 
exited through an opening between the two detector halves and was 
absorbed in a beam dump behind the detectors (Fig. 1). The detector 
pair was placed 564mm away from the interaction point, giving 
maximum full-period resolutions of 10.2 nm at the edges and 7.2nm 
at the corners of the compound detector at 1.8-keV photon energy. 

Figure 2a, b shows single shot X-ray diffraction patterns of individual 
mimivirus particles, and Fig. 2c shows a transmission electron micro- 
graph of a single mimivirus particle. Each of the diffraction patterns 
contains about 1,700,000 scattered photons. The lowest-resolution data 
are missing between the two detector halves, so the total number of 
scattered photons exceeds this number. Figure 2d, e shows autocorrela- 
tion functions calculated from the diffraction patterns. Missing low- 
resolution data act as a high-pass filter. For an object of extent D, the 
extent of its autocorrelation is 2D and the diffraction intensities are 
band-limited with a Nyquist rate of 1/2D. The size and shape of the 
autocorrelation functions in Fig. 2d, e are indicative of hits on single 
virus particles. Figure 2f, g shows the reconstructed exit wavefronts for 
these mimivirus particles. The shapes and sizes of the reconstructed 
objects agree with data from prior cryo-electron microscopy studies in 
which 30,000 images were averaged’. In contrast, the reconstructed 
structures in Fig. 2f, g come from single shots from single particles, 
and demonstrate the power of this new imaging concept’. 

We performed image reconstruction by iterative phase retrieval 
implemented in the Hawk software package'’, using the RAAR algo- 
rithm” enhanced with both reality and positivity constraints. The sup- 
port was handled by a Shrinkwrap algorithm'* with the constraint of 
having a specific area that was estimated from the autocorrelation func- 
tion. Weakly constrained modes in the reconstructions were identified 
and removed, using the formalism of ref. 19. This is a linear algebra 
method to compensate for noise, or the lack of constraints in the missing 
central region of the pattern. The uncertainty in the overall density was 
less than 10% after the identification and removal of the unconstrained 
modes. We then fitted these modes to match the total density of a 
spherical or a suitably rotated icosahedral profile. The missing modes 
were adjusted to give a total density that best matched the target. 
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Residual phase fluctuations were then suppressed by averaging many 
reconstructions, using different random seeds. The results gave 
improved image reliability. For details, see Methods. 

We estimated the image resolution in the reconstruction by com- 
puting the phase retrieval transfer function*” (PRTF; Fig. 2h, i), which 
represents the confidence in the retrieved phases as a function of 
resolution. No consensus has emerged so far on what single PRTF 
value should be used as the measure of resolution (values between 
0.5 and 0.1 can be found in the literature; see Methods). We characterize 
resolution by the point where the PRTF drops to 1/e (ref. 20), and this 
corresponds to a full-period resolution of 32nm in both cases. We 
expect significantly higher resolutions in such experiments with shorter 
and brighter photon pulses focused to a smaller area. 

In principle, resolution could reach less than 1 nm in a single expo- 
sure with a biological object of similar size to the mimivirus particle’. 
This resolution would require a free-electron laser pulse shorter than 
about 5 fs at 1.8-keV energy and a photon flux on the sample of more 
than 3 X 10'' photons per square micrometre®. This pulse length and 
photon flux are beyond the initial capabilities of the LCLS, although 
there have already been indications of nearly transform-limited LCLS 
pulses lasting only a few femtoseconds and containing about 5 X 10"? 
photons per pulse in the unfocused beam”’. 

With very short pulses, exposures could be over before there is time 
for significant Auger emission or for the development of secondary 
electron cascades in the sample’. The conventional handicap of X-rays 
relative to electrons in imaging could thus be reversed and made into a 
net gain over a broad range of sample sizes. First experiments at the 
LCLS show a significant drop in the photoelectric cross-section of 
hollow atoms”. This effect was predicted earlier’, but it is larger than 
expected and can already be measured with LCLS pulses 20-80 fs in 
duration”. The results show photoabsorption decreased 20-fold in 
hollow neon to equal the cross-section of coherent scattering’. In 
addition, neon ions with double core holes had an extended lifetime”. 
At 1.8-keV photon energy, more than 90% of the total photoelectric 
cross-section of carbon, nitrogen and oxygen can be attributed to 1s 
electrons. Ejection of these electrons at the beginning of an intense and 
short pulse could practically stop photoionization without signifi- 
cantly changing the elastic cross-sections of outer-shell electrons. 

We see no measurable sample deterioration. With the X-ray pulses 
used in this study, the explosion of micrometre-sized objects is hydro- 
dynamic’ and the sample burns from the outside inwards, rarefying and 
destroying outer contours first. Trapped electrons move inwards to neu- 
tralize an increasingly positive core, and leave behind a positively charged 
outer layer, which then peels off over some picoseconds”’. The recon- 
structed exit wavefront of the mimivirus particle shows well-defined outer 
contours and gives a sample size consistent with the intact virus capsid 
(we do not expect to see the thin viral fibrils at the length scales accessible 
here). Other studies of protein nanocrystals* at the LCLS at 0.9-nm 
resolution show no measurable deterioration of Bragg peaks during illu- 
mination with pulses similar to those used here. The size of these protein 
nanocrystals was similar to the size of the mimivirus particles. 

At this stage, it is unclear how reproducible is the interior structure 
of mimivirus particles (or that of any other viral particles) in terms of 
atomic positions, and this will need further study. The viral inner 
capsid consists of a thin protein shell (about 7 nm thick) lined with 
phospholipid membranes. The structure of the protein shell seems to 
be reproducible to at least 6.5 nm resolution’. Figure 2d, e suggests an 
inhomogeneous interior structure for the virion. The interior structure 
does not necessarily follow the pseudo-icosahedral outer shape (the 
capsid is believed to have a single, five-fold symmetry axis’). 

The penetration depth of X-rays permits studies on the interiors of 
large objects. The methods applied here require no modifications to 
the sample such as staining, freezing, sectioning, radiolabelling or 
crystallization, and can also be used to image cells that are alive at 
the time of the exposure. The amount of missing data can be reduced 
by adding an additional detector pair behind the first pair. Another 
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Figure 2 | Single-shot diffraction patterns on single virus particles give 
interpretable results. a, b, Experimentally recorded far-field diffraction 
patterns (in false-colour representation) from individual virus particles 
captured in two different orientations. c, Transmission electron micrograph of 
an unstained Mimivirus particle, showing pseudo-icosahedral appearance’. 

d, e, Autocorrelation functions for a (d) and b (e). The shape and size of each 
autocorrelation correspond to those of a single virus particle after high-pass 
filtering due to missing low-resolution data. f, g, Reconstructed images after 
iterative phase retrieval with the Hawk software package’®. The size of a pixel 
corresponds to 9 nm in the images. Three different reconstructions are shown 
for each virus particle: an averaged reconstruction with unconstrained Fourier 


necessary improvement is to increase the dynamic range of the detec- 
tors. In our experiments, there were shots extending to significantly 
higher resolutions than those reported here but they contained too 
many saturated pixels at low angles (more missing modes), prevent- 
ing image reconstruction. With reproducible samples, where the experi- 
ment can be repeated on a new object, a three-dimensional data set can 
be collected, and the resolution extended (even from weak individual 
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modes"” and two averaged images after fitting unconstrained low-resolution 
modes to a spherical or an icosahedral profile, respectively. The orientation of 
the icosahedron was determined from the diffraction data. The results show 
small differences between the spherical and icosahedral fits. h, i, The PRTF for 
reconstructions where the unconstrained low-resolution modes were fitted to 
an icosahedron. All reconstructions gave similar resolutions. We characterize 
resolution by the point where the PRTF drops to 1/e (ref. 20). This corresponds 
to 32-nm full-period resolution in both exposures. Arrows mark the resolution 
range with other cut-off criteria found in the literature (Methods). Resolution 
can be substantially extended for samples available in multiple identical 
copies'?>*8, 


exposures) by merging redundant data**~’. Studies of virus particles 
with higher-intensity photon pulses and improved detectors could 
answer the question of whether the core is reproducible to subnano- 
metre resolution or whether the viral genome has the ‘molecular indi- 
vidualism’ that genomic DNA structures explore in vitro”. 

Note added in proof: In a previous study”’, synchrotron radiation was 
used to obtain X-ray diffraction data on a herpes virus. 
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METHODS SUMMARY 


Experiments were performed with the CAMP instrument” on the AMO beam- 
line’* at the LCLS®, with the LCLS running at a repetition rate of 30 Hz. CAMP 
supports a variety of imaging and atomic/molecular physics experiments. 

Diffraction patterns were recorded on a pair of pnCCD detectors’* (maximum 
read-out speed, 250 frames per second). The sample-to-detector distance was 
564mm. The active area of each detector half was 76.8 mm X 38.4 mm and con- 
tained 1,024 X 512 pixels of area 75 X 75 um”. The full-well capacity of a pixel was 
280,000 electrons, corresponding to ~570 X-ray photons per pixel at 1.8-keV 
photon energy. 

The electron bunch was 70 fs long (full-duration at half-maximum), but the 
corresponding photon bunch is thought to be shorter”. The photon bunch con- 
tained 8 X 10!’ photons per pulse (0.24 mJ at 1.8keV) and had a diameter of 
~10um (full-width at half-maximum) at the interaction point, giving 
~1.6 X 10’° photons per square micrometre in the centre of the beam and a peak 
power density of ~6.5 X 10'° Wcm ”. Background scattering from residual gas in 
the vacuum chamber did not exceed the read-out noise of the detectors nor the 
noise of the diffuse photon background (<1.3 photons per pixel). This is remarkable, 
considering that the number of photons in the pulse was nearly 100,000,000,000 
times higher than the photon background. 

Purified mimivirus was transferred into a volatile buffer and the suspension was 
aerosolized with helium gas in a gas dynamic nebulizer’. The aerosol of hydrated 
virus particles was sampled into a differentially pumped injector through an inlet 
nozzle coupled to a skimmer. The aerosol (in helium atmosphere) passed through 
a variable relaxation chamber from where the equilibrated and adiabatically cooled 
particles entered a differentially pumped aerodynamic lens stack. Particles focused 
by the aerodynamic lens were intercepted randomly by the LCLS pulses. 
Diffraction patterns of free-flying virus particles were exceptionally clean. 

Image reconstruction was performed with the open source Hawk software’®, 
available from http://xray.bmc.uu.se/hawk. The background-corrected diffraction 
patterns and the Hawk configuration files used in the reconstructions can be 
downloaded from this site. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Experimental set-up at the LCLS. Experiments were performed with the CAMP 
instrument’? on the AMO beamline’? at the LCLS*, with the LCLS running at a 
repetition rate of 30 Hz. CAMP supports a variety of imaging and atomic/molecular 
physics experiments. 

Diffraction patterns were recorded on a pair of pnCCD detectors’* (maximum 
read-out speed, 250 frames per second). The sample-to-detector distance was 
564mm. The active area of each detector half was 76.8 mm X 38.4 mm and con- 
tained 1,024 X 512 pixels of area 75 X 75 um”. The full-well capacity of a pixel was 
280,000 electrons, corresponding to ~570 X-ray photons per pixel at 1.8-keV 
photon energy. 

The electron bunch was 70 fs long (full-duration at half-maximum), but the 
corresponding photon bunch is thought to be shorter”. The photon bunch con- 
tained 8 X 10'' photons per pulse (0.24 mJ at 1.8keV) and had a diameter of 
~10um (full-width at half-maximum) at the interaction point, giving 
~1.6 X 10'° photons per square micrometre in the centre of the beam and a peak 
power density of ~6.5 X 10'° Wcm *. Background scattering from residual gas in 
the vacuum chamber did not exceed the read-out noise of the detectors nor the 
noise of the diffuse photon background (<1.3 photons per pixel). This is remark- 
able, considering that the number of photons in the pulse was nearly 
100,000,000,000 times higher than the photon background. 

Purified mimivirus* particles were transferred into a volatile buffer (250 mM 
ammonium acetate, pH 7.5) and the suspension (107 particles per millilitre) was 
aerosolized at a rate of about 5y1min ', using helium gas in a gas dynamic 
nebulizer’. The aerosol of hydrated virus particles was sampled into a differentially 
pumped sample injector through an inlet nozzle coupled to a skimmer. Most of the 
nebulizing gas, and vapours of the volatile buffer, were pumped away at this point. 
The heavier aerosol (in a wet helium atmosphere at a pressure of about 10°? mbar) 
passed through a variable-volume relaxation chamber from where the equilibrated 
and adiabatically cooled particles entered a differentially pumped aerodynamic lens 
stack. The pressure dropped from about 10°” mbar to about 10° * mbar at the exit 
of the lens. Particles focused by the aerodynamic lens entered the interaction zone 
(10 °mbar) with an estimated velocity of 60-100ms ' and were intercepted 
randomly by the LCLS pulses. 

Data processing included removal of signal from known bad or saturated pixels, 
correction for the residual common mode offsets and application of a flat-field 
correction. The corrected patterns were used directly without symmetrization. 

Image reconstruction was performed with the open-source Hawk software 
package’®, using the RAAR algorithm” and its support constraint, enhanced by 
additional reality and positivity constraints. Hawk is available from http:// 
xray.bmc.uu.se/hawk. The background-corrected diffraction patterns and the 
Hawk configuration files used in the reconstructions can be downloaded from 
this site. 

A Fourier constraint was applied to match Fourier amplitudes with experi- 
mental amplitudes through a projection. No explicit Fourier constraints were used 
for regions of missing data, although these regions were implicitly constrained in 
Fourier space by the real-space constraints. The support was handled using a 
Shrinkwrap algorithm'* with the constraint of having a specific area that was 
estimated from the autocorrelation function. 

Weakly constrained modes in the reconstructions were identified and removed, 
using the formalism of ref. 19. This is a linear algebra method to compensate for 
noise, or the lack of constraints in the missing central region of the pattern. The 
diffracted amplitudes in the region of missing data can be recovered by iterative 
phasing algorithms, but for patterns where this region is extensive the recovered 
amplitudes will be unreliable’’. Missing modes were identified and their constrain- 
ing power was calculated by performing a singular-value decomposition on the 
transform from the region of missing data to the support. The singular values 
identify the modes that are most weakly constrained, as the singular vectors, and 
determine their constraining power. In the patterns discussed in this paper, there 
are modes with very low constraining power. These modes are therefore virtually 
unconstrained and their strength had to be estimated in another way. The threshold 
used for identifying unconstrained modes was 0.999, corresponding to a constrain- 
ing power of 0.045. The uncertainty in the total image density dropped to less than 
10% after removing these modes. Missing modes were fitted to match the total 
density of a spherical or a suitably rotated icosahedral profile. The missing modes 
were adjusted to give a total density that best matched the target. 


The number of weakly constrained modes that were classified as unconstrained 
differs slightly between the two reconstructions because the support recovered 
through the Shrinkwrap algorithm” is slightly different for Fig. 2a and Fig. 2b. For 
the reconstruction starting from Fig. 2a, the median number of missing modes was 
8 and the average was 7.75 modes. For the reconstruction starting from Fig. 2b, the 
number was slightly higher: median 12 and average 11.85. This difference is due to 
a larger area being missing in the centre of Fig. 2b owing to there being more 
saturated pixels. 

Residual phase fluctuations were suppressed by averaging many reconstruc- 
tions, using different random seeds. The results gave improved image reliability. 
For reconstructions from Fig. 2a, 10,000 iterations were used and 200 reconstruc- 
tions were obtained from different random starting positions and then used to 
calculate the PRTF. The support was updated every 20 iterations. All 200 recon- 
structions had a Fourier error below a threshold of 0.33. For reconstructions from 
Fig. 2b, 40,000 iterations were performed and 94 reconstructions obtained. Of 
these, only 56 reconstructions had a Fourier error below a threshold of 0.33. The 
differences underline the deleterious effect of missing low-resolution data on 
image reconstruction. Reliable image reconstruction needs a more efficient way 
of measuring low-angle diffraction data, including a wider dynamic range for the 
detector. An attenuator disk centred on the X-ray beam and placed over the middle 
part of the pnCCD detector pair could reduce the strong forward-scattered signal 
in the middle of the diffraction pattern and bring low-resolution data within the 
useful dynamic range of the detector. 

We estimate the image resolution in the reconstruction by computing the 
PRTF*”. No consensus has emerged so far on what single PRTF value should 
be used as the measure of resolution. Values between 0.5 and 0.1 can be found in 
the literature**. We characterize resolution by the point where the PRTF drops to 
1/e (ref. 20). Diffraction data extend to higher resolution than the resolution given 
by the PRTF. 

The angle spanned by the signal was small enough for the entire particle to fit 
within the depth of field. Defocus effects are therefore avoided by using a reality 
constraint. A resolution of 1nm at the same X-ray wavelength would require 
measuring at high angle, leading to significant deviation from the projection 
image. Real-value constraints would not work in the latter case, and this would 
make the reconstruction more challenging but by no means impossible (see, for 
example, ref. 34). 

Transmission electron microscopy was performed with a Hitachi H-7100 elec- 

tron microscope on unstained mimivirus particles deposited on Formvar-coated 
gold grids. 
A route for improvements. There is a clear need to achieve higher resolution in 
single shots. This requires an increased photon flux on the sample as well as a wider 
dynamic range for detecting photons in the diffraction pattern. The LCLS° is 
capable of delivering very short X-ray pulses” to outrun significant sample explo- 
sion with more photons per pulse. Tighter focusing and an increased photon 
output from the LCLS have already been achieved and will increase the flux on 
the samples in forthcoming runs. A broad dynamic range in detecting photons is 
necessary to avoid saturation at low angles. In this first set of experiments, there 
were already exposures with significantly higher resolutions than those reported 
here, but these exposures contained too many saturated pixels at low angles, 
preventing image reconstruction. Reliable image reconstruction needs a more 
efficient way of measuring low-angle diffraction data. A graded attenuator around 
the central hole of the pnCCD detector pair could help here. An additional pair of 
detectors placed far behind the first detector pair could record more of the low- 
angle data over a larger area. Maintaining sample integrity during injection is a key 
requirement in the experiment. More data on a diverse set of samples (such as cells, 
viruses and macromolecules) will be needed to map out the available parameter 
space. Hit rates could be increased by improved injection methods, using a narrower 
particle beam. A future extension to imaging single macromolecules will need these 
developments. 
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Late Holocene methane rise caused by orbitally 
controlled increase in tropical sources 


Joy S. Singarayer', Paul J. Valdes’, Pierre Friedlingstein'*, Sarah Nelson! & David J. Beerling® 


Considerable debate surrounds the source of the apparently 
‘anomalous”' increase of atmospheric methane concentrations 
since the mid-Holocene (5,000 years ago) compared to previous 
interglacial periods as recorded in polar ice core records’. 
Proposed mechanisms for the rise in methane concentrations 
relate either to methane emissions from anthropogenic early rice 
cultivation’* or an increase in natural wetland emissions from 
tropical* or boreal sources**®. Here we show that our climate and 
wetland simulations of the global methane cycle over the last glacial 
cycle (the past 130,000 years) recreate the ice core record and capture 
the late Holocene increase in methane concentrations. Our analyses 
indicate that the late Holocene increase results from natural changes 
in the Earth’s orbital configuration, with enhanced emissions in the 
Southern Hemisphere tropics linked to precession-induced modi- 
fication of seasonal precipitation. Critically, our simulations capture 
the declining trend in methane concentrations at the end of the last 
interglacial period (115,000-130,000 years ago) that was used to 
diagnose the Holocene methane rise as unique. The difference 
between the two time periods results from differences in the size 
and rate of regional insolation changes and the lack of glacial incep- 
tion in the Holocene. Our findings also suggest that no early agri- 
cultural sources are required to account for the increase in methane 
concentrations in the 5,000 years before the industrial era. 

Atmospheric methane (CH,) is a strong greenhouse gas influenced 
by various natural (for example, wetlands, biomass burning) and 
anthropogenic (for example, rice agriculture, enhanced biomass burn- 
ing, landfill) sources’. The major sink of methane is oxidation in the 
troposphere by reaction with hydroxyl radicals. Atmospheric hydroxyl 
radicals are consumed by oxidation of volatile organic compounds 
(VOCs), including isoprene. Consequently, biogenic VOC fluxes 
indirectly influence the lifetime of methane in the atmosphere (cur- 
rently 8.7 + 1.3 years; ref. 7) by altering the concentration of hydroxyl 
radicals. 

Since the start of the industrial era, atmospheric methane concentra- 
tions have more than doubled to 1,780 parts per billion by volume 
(p.p.b.v.; ref. 7), whereas over the last four glacial cycles (420,000 years; 
420 kyr), polar ice core records indicate a natural range of ~700 p.p.b.v. 
(interglacials) to ~360 p.p.b.v. (glacials)* (Fig. 1a). Multi-millennial 
variations correlate strongly with orbital precession, owing to the pre- 
dominant response of tropical wetland emissions to low-latitude 
Northern Hemisphere summer insolation via monsoons*. However, 
a particularly intriguing feature of these ice-core records is that the 
correlation between methane concentrations and Northern Hemi- 
sphere summer insolation (30° N, 21 June) appears to break down from 
the mid-Holocene onwards, when Northern Hemisphere summer 
insolation decreases as the atmospheric methane concentration slowly 
increases? from ~550p.p.bv. at 5kyr before present (BP) to 
~675 p.p.b.v. shortly before the industrial era (Fig. 1a and 2). 

Several mechanisms have been proposed for this ‘anomalous’ late 
Holocene trend. First, that emissions from circum-Arctic wetlands 


increased in a slow response to warmer, stable Holocene conditions*’®, 


a mechanism supported by 5'*CH, measurements from Greenland ice 
cores*'', Second, changes in the inter-polar gradient of methane*”” 
suggested increased natural tropical emissions. However, Northern 
Hemisphere tropical wetland regions are believed to have experienced 
a natural drying since the early Holocene’*. Third, a controversial 
hypothesis posits that it results from early agricultural activity, par- 
ticularly human-formed wetlands created by irrigating rice paddies, 


farming ruminants, burning biomass and human waste’. It remains 
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Figure 1 | Time series of model and ice core data for the last glacial cycle. 
a, Atmospheric CH, concentrations from the Antarctic EPICA Dome C ice 
core’ (aqua) and 30° N 21 June insolation”® (grey). b, Composite atmospheric 
CO), concentrations from Antarctic ice cores. c, Modelled global annual total 
methane emissions from experiment ALL. d, Global annual isoprene emissions 
from ALL. e, Modelled methane concentrations from ALL (blue diamonds) 
compared with the EPICA ice core record (aqua). Error bars in c-e denote the 
range of values obtained when the vegetation/wetland model was run with three 
different mean climatologies taken from the HadCM3 ALL simulations. 
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Figure 2 | A comparison of model- and data-based CH, concentrations 
from the current interglacial and the previous interglacial (Eemian) periods. 
a, Northern Hemisphere summer insolation at 30° N for the Holocene (red 
line) and Eemian (green line). b, EPICA ice core CH, record (grey line) and 
modelled CH, concentrations for the Holocene (red line and symbols), and 
similarly for the Eemian (green line and symbols). 


uncertain whether agricultural activities and population were suf- 
ficient to influence global atmospheric composition by 5 kyr before 
present (BP)°. However, although the idea of significant early anthro- 
pogenic impacts on atmospheric methane levels has been questioned 
on several fronts’*, no modelling studies have yet been able to explain 
both the Holocene and pre-Holocene methane record. 

Here we report model-based reconstructions of terrestrial wetland 
methane emissions and atmospheric concentrations over the last glacial 
cycle (130 kyr) due only to natural forcing mechanisms. Specifically, our 
study addresses the issues of (1) whether natural processes explain the 
late-Holocene methane increase and the pre-Holocene methane record, 
(2) the main location of increased methane emissions during the late 
Holocene, and (3) the relative importance of different natural processes 
in producing glacial—interglacial methane changes. We simulated 65 
‘snapshots’ spanning 130 kyr of the last glacial cycle’® using the coupled 
ocean-atmosphere Hadley Centre climate model (HadCM3). The 
resulting climatologies were used to drive the Sheffield Dynamic 
Global Vegetation Model (SDGVM)”’ coupled to a wetlands methane 
emission model'® to predict the location of vegetation, wetlands, 
methane emissions and VOC emissions over this period. Modelled 
emissions were used to reconstruct atmospheric methane concentra- 
tions for the last glacial cycle using a simplified scheme that captures the 
relevant atmospheric chemistry, drawing on earlier chemistry-model 
experiments'*. We undertook three sets of experiments designed to 
identify mechanisms of atmospheric methane increase: (1) varying 
orbital configuration only (ORB-ONLY), (2) varying orbital configura- 
tion and atmospheric greenhouse gas (CO2, CH, and NO) concentra- 
tions (ORB+GHG), and (3) varying orbital configuration, greenhouse 
gas concentrations, ice-sheet extent and sea level (ALL). See Methods 
for further details. 

Variations in modelled global methane emissions over the last glacial 
cycle exhibit several robust features which allow us to exclude specific 
candidate mechanisms for the late Holocene rise in methane. Methane 
emissions from experiment ALL display a high degree of correlation 
with 30°N summer insolation over the full glacial cycle (1° = 0.77, 
excluding the late Holocene), reminiscent of the ice core methane 
record (Fig. 1a and c). These results also reproduce an early Holocene 
decrease in methane emissions followed by a smaller increase during 
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the late Holocene. Given that the model contains no prescribed human 
emissions, this increase suggests that the increases in methane concen- 
trations seen in the ice core record are predominantly natural. 

Modelled methane concentrations (Fig. le) demonstrate considerable 
variation at the precessional scale and display a strong increase from 
5 kyr BP onwards in good agreement with the ice core record. This is 
primarily due to increases in wetland emissions, with a much smaller 
contribution (~20%) from isoprene variations (Supplementary Fig. 4). 
Crucially, the model simulates both the decrease in methane concentra- 
tions following the Eemian interglacial maximum and the late Holocene 
increase (Fig. 2). The difference between the Eemian and Holocene 
methane trends has previously been used as key evidence in support 
of the idea that early agriculture had a significant impact on atmospheric 
methane concentrations in the past 5 kyr (ref. 15). Here, we simulate 
both the Eemian and Holocene methane concentration variations 
without anthropogenic influence. 

Several processes could potentially be driving the methane rise 
during the late Holocene. We used sensitivity simulations to separate 
the effects of orbital forcing, changes in atmospheric CO, concentra- 
tion, ice-sheet distribution, and sea level. Between 8 kyr Bp and shortly 
before the industrial era (~0.2 kyr Bp), atmospheric CO) increased by 
~17 p.p.mv. (Fig. 1b). This increase could influence wetland methane 
emissions in two ways. First, it could induce warmer, wetter condi- 
tions, which would promote increased plant primary productivity 
and hence soil carbon available for methanogenesis. Second, it could 
result in higher methane emissions by increasing primary produ- 
ction through CO, fertilization’’. In a separate experiment, we 
forced SDGVM with the climate output of experiment ALL but with 
a prescribed pre-industrial CO, concentration (280 p.p.m.v.). This 
methodology isolates the effect of radiative CO impacts by eliminat- 
ing CO, fertilization. The results (ALL_FIXCO2) demonstrate that 
CO), fertilization contributes ~1/3 of the range of Last Glacial 
Maximum to peak Holocene variations in experiment ALL methane 
emissions (Fig. 3a). However, even after removing CO, fertilization 
effects, the rise in late Holocene methane emissions remains. 
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Figure 3 | Temporal and spatial patterns of modelled methane emission 
changes for the last glacial cycle. a, Time series of modelled global annual total 
CH, emissions from the four experiments: ALL, ALL_FIXCO2, ORB+ GHG 
and ORB-ONLY. b, Anomalies in annual methane emissions (pre-industrial 
minus 5 kyr Bp) for simulation ALL. c, Anomalies in annual methane emissions 
for the Eemian interglacial (116 kyr minus 122 kyr Bp) for simulation ALL. 
Dashed line in b and c represents the Equator, and numbers above and below 
are the anomalies in methane emissions for the northern and southern 
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hemisphere in TgCyr ~. 


3 FEBRUARY 2011 | VOL 470 | NATURE | 83 


©2011 Macmillan Publishers Limited. All rights reserved 


LETTER 


The ORB-ONLY experiment eliminates both radiative and fertiliza- 
tion impacts of CO). It exhibits the smallest variation in global methane 
emissions of all the experiments (Fig. 3a), with glacial—interglacial dif- 
ference approximately 50% of that in experiment ALL. As in the other 
experiments, there is a strong correlation with Northern Hemisphere 
summer insolation. Importantly, during the Holocene, ORB-ONLY 
methane emission changes are of similar magnitude and sign as in the 
other experiments: the late Holocene increase is evident, even though all 
other climate driving forces have been fixed. Therefore, whether the 
slight increase in CO from the early to late Holocene is natural’? or 
anthropogenic’, it is not the main driving mechanism of late Holocene 
methane. The results of the sensitivity experiments imply strongly that 
forcing of climate by changes in insolation played a significant role in the 
late Holocene methane increase. Slight variation occurs in the timing of 
the late Holocene increase (between 5kyr and 3kyr BP) among the 
different experiments (Fig. 3a), which arises partly from variations in 
the boundary conditions between experiments, and also from inherent 
variability in the climatologies used to drive the SDGVM. The 
uncertainty from climate variability was quantified by running the 
SDGVM with three different 30-year average climatologies from each 
simulation to produce repeat values of methane and isoprene emissions. 
The full range of resultant values was used as the uncertainty estimate 
(Fig. 1c-e). The rise in methane during the late Holocene remains 
significant after these uncertainties are taken into consideration. 

We next address the mechanisms by which the last 5 kyr appear to 
deviate from the correlation of methane emissions with orbital pre- 
cessional variation’. Examination of the simulated spatial distribution 
of anomalies (value at 0 kyr BP minus value at mid-Holocene; Fig. 3b) 
shows that the Southern Hemisphere is the main location of increasing 
wetland emissions during the late Holocene. During this period there 
is negligible change in ice-sheet volume or atmospheric CO, due to 
relatively small variations in orbital configuration compared to the 
previous interglacial. In the ORB-ONLY experiment, methane emis- 
sions lead Northern Hemisphere summer insolation in general by 
3-5 kyr (Supplementary Fig. 5b), owing to the balance between the 
large methane sources (Eurasia and South America), which have dif- 
ferent seasonal cycles of precipitation and wetland emissions (Eurasia 
restricted to June to August, whereas South America emissions extend 
over November to May; see Supplementary Fig. 7). However, when 
glacial—interglacial variation is simulated through additional prescrip- 
tion of ice-sheets and CO, changes, these influence the relative mag- 
nitudes of emissions from the different regions and their seasonality; 
the changes over time in the proportion of emissions from different 
regions change the phase of global methane variation subtly, depend- 
ing on the degree of glaciation and size of insolation variation. In the 
last glacial, the methane emissions from South America are smaller 
and much less variable on the precessional timescale. By comparison, 
emissions from Eurasia and East Asia (Supplementary Fig. 2) decrease, 
but maintain a strong variation with insolation. For further informa- 
tion, see Supplementary Information. The balance of these regional 
factors results in the overall correlation of global methane emissions 
with summer insolation at 30° when all forcings are included. 

The reason why this correlation appears to break down in the mid- 
Holocene but not in the Eemian is due to the lack of glacial inception 
exerting subtle changes in the strength of the different source regions. 
The regional anomalies (value at Okyr minus value at 5 kyr Bp) in 
experiment ALL show that the Southern Hemisphere increases in 
methane emissions are larger than the concurrent decreases in the 
Northern Hemisphere (Fig. 3b). Anomalies for time slices of com- 
parable orbital configuration in the Eemian (value at 116 kyr minus 
value at 122 kyr Bp) show that the increases in the Southern Hemisphere 
are larger than those in the Holocene in line with the larger changes in 
insolation, but the increases in the Southern Hemisphere are out- 
weighed by the decline from Northern Hemisphere sources (Fig. 3c) 
due to the difference in latitudinal variation in insolation. The decreases 
at high northern latitudes are amplified in experiment ALL compared 
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to ORB-ONLY (which leads 30° N summer insolation in the Eemian as 
well as the Holocene) due to glacial inception at 116 kyr Bp. The com- 
plexity of the responses to the various regional forcings means that 
concentrations cannot be derived from one insolation curve, and that 
an ‘earth system’ modelling approach employed here is essential to 
understand drivers of Pleistocene methane records. 

The climate and vegetation models used in this study have been 
extensively validated against modern observations, as well as palaeo- 
proxy records in the case of HadCM3 (Online Methods). We have also 
quantified statistical uncertainties related to internal variability of the 
climate model for experiment ALL and demonstrated that the findings 
are robust. However, within the scope of this study we have not been 
able to quantify uncertainties arising from the physical parameteriza- 
tion values in the models. We recognize several other caveats asso- 
ciated with our approach. For example, there is no millennial-scale 
variation, although this is not essential for the longer timescale we are 
considering. The modelling approach neglects transient responses, such 
as the potential of boreal wetlands to respond to delayed circum-Arctic 
warming in the Holocene’. Given the agreement between simulated and 
measured methane concentrations obtained throughout the glacial 
cycle, we propose that including such effects is not crucial. The isoprene 
model is realistic but empirical in terms of its dependence on carbon 
dioxide”, and there is uncertainty in the mechanisms involved’. 

We conclude that the late Holocene increase in methane can be 
primarily ascribed to increasing emissions from the Southern Hemi- 
sphere tropics. In the Holocene, unlike the last interglacial, these 
increases are not counteracted by equivalent decreases in Northern 
Hemisphere emissions. We suggest therefore that direct anthropogenic 
influences are not necessary to explain the late Holocene methane 
record. 


METHODS SUMMARY 


We have performed multiple snapshot simulations with the Hadley Centre 
coupled atmosphere-ocean climate model, HadCM3”*”». Three sets of snapshot 
simulations were conducted, consisting of 65 model runs covering the past 
130 kyr, at a frequency ranging from every 4,000 years at the start (120-80 kyr 
BP), to every 2,000 years (80-22 kyr BP) and to every 1,000 years at the end (22 kyr 
BP to the ‘pre-industrial’ O kyr sp)'*. Experiment ALL has glacial-interglacial 
changes to ice-sheet volume derived from the ICE5G model”, greenhouse gases*”° 
and orbital configuration’®. Experiment ORB+GHG includes changes to orbital 
configuration and greenhouse gases, and ORB-ONLY has only changes to orbital 
configuration (other forcings fixed at pre-industrial magnitudes). In all simula- 
tions the initial conditions were based on a spun-up pre-industrial simulation and 
each was run for 500 years. The results presented here are mean climatologies of 
the last 30 years of each simulation. 

SDGVM was used to simulate patterns of net primary production (NPP), leaf 
area index (LAI) and plant functional types using the HadCM3-derived monthly 
inputs of temperature, precipitation, relative humidity and cloudiness, and global 
data sets of soil texture'””’. Seasonal patterns of wetland methane emissions from 
terrestrial ecosystems were computed with a process-based model describing the 
dependence of anaerobic microbial methane production and aerobic oxidation on 
temperature, gross primary productivity, soil respiration and soil water table 
depth'***. Monthly trace gas VOC emissions (isoprene, monoterpene and others) 
are calculated using a global scheme” coupled to SDGVM modified to account for 
monthly LAI changes. CO, dependence of isoprene emissions was parameterized 
using an empirically based function’’. Atmospheric methane concentrations were 
calculated by a first order approximation using the experiments from a previous 
modelling study’* which estimated the sensitivity of global methane concentra- 
tions to isoprene and methane emissions. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

HadCM3 experiments. We have performed multiple ‘snapshot’ simulations with 
the Hadley Centre climate model, HadCM3. HadCM3 consists of coupled dynamic 
ocean”, atmosphere” and sea-ice models. The sea-ice model includes parameteriza- 
tions of ice drift and leads”®. The resolution of the atmospheric model is 2.5° in latitude 
by 3.75° in longitude by 19 unequally spaced levels in the vertical. The spatial reso- 
lution over the ocean in HadCM3 is 1.25° by 1.25° by 20 unequally spaced layers in the 
ocean extending to a depth of 5,200 m. In this version of the model, interactive 
vegetation is not included. The HadCM3 model has been validated against modern 
observations”, The model has been compared to other climate models within the 
Paleoclimate Model Intercomparison Project (PMIP1* and PMIP2***) under pre- 
industrial, mid-Holocene (6 kyr Bp) and Last Glacial Maximum (21 kyr BP) conditions. 
Ina previous study we have also compared the glacial cycle snapshot simulations used 
here against ice core palaeo-records'*®. The global glacial-interglacial temperature 
range compares well with other models and the palaeo-record'*”’. Temperature 
changes directly over Greenland and East Antarctica are underestimated compared 
to ice core reconstructions"®, a bias that is common to all global climate models. Low- 
latitude variation in precipitation in the inter-tropical convergence zone is similar to 
other models*’**, with a small northerly bias in West Africa®’. 

We have performed three sets of snapshot simulations, each consisting of 65 model 
runs covering the whole of the past 130 kyr (ref. 16), ata frequency ranging from every 
4,000 years at the start of the period (between 120 kyr Bp and 80 kyr Bp), to every 
2,000 years from 80 kyr BP to 22 kyr Bp and to every 1,000 years from 22 kyr BP to the 
Okyr Bp. The 0 kyr time slice has ‘pre-industrial’ boundary conditions for ~1850 ab. 
Experiment ALL has glacial—interglacial changes to ice-sheet volume and extent, 
greenhouse gases and orbital configuration”’. Experiment ORB+GHG includes 
changes to orbital configuration and greenhouse gases, and ORB-ONLY has only 
changes to orbital configuration, with all other forcings fixed at pre-industrial magni- 
tudes (for this experiment time slices were run for 0-70 kyr Bp and 114-130 kyr Bp). 
Atmospheric concentrations of CO, were taken from EDC96* (0-22 kyr Bp), Taylor 
Dome”? (22-62 kyr Bp) and Vostok’* (62-363 kyr BP) ice cores, and CHy and N.O 
were taken from EPICA’ ice cores; all are on the EDC3 timescale*. We have developed 
our ice sheet reconstructions using the ICE5G model”*. This data set includes a 
detailed evolution of the ice thickness, extent, and continental isostatic rebound for 
the whole period from the LGM to the modern at 500-year intervals. We used an 
anomaly-based method to calculate our palaeogeographic boundary conditions. 
Anomalies of a particular time-slice palaeogeography minus pre-industrial ICE-5G 
data are then added to our model pre-industrial geographical boundary conditions'®. 

In all simulations the initial conditions were the same, based on a spun-up pre- 

industrial simulation. Each simulation was run for 500 years. The approach of 
using pre-industrial initial conditions for all simulations is a reasonable comprom- 
ise necessary to enable us to run the simulations simultaneously, meaning it took 
months rather than years to complete the total set. The results presented here are 
climatologies of the last thirty years of each simulation. 
SDGVM vegetation and wetland model. SDGVM simulates global patterns of net 
primary production (NPP), leaf area index (LAI) and the distribution of plant func- 
tional types from monthly inputs of temperature, precipitation, relative humidity and 
cloudiness, and global data sets of soil texture’”’’. Core modules of net photosyn- 
thesis, stomatal conductance, canopy transpiration, uptake of mineralized nitrogen 
and responses of these attributes to changes in soil water supply are detailed, and 
rigorously evaluated against field observations'””’*’. A key feature of the model is the 
coupling of above- and below-ground carbon and nitrogen cycles. Litter production 
influences soil C and N pools via the Century soil nutrient cycling model”*, which in 
turn feedback to influence above-ground primary production. 

Local and global-scale predictions of NPP, LAI and plant functional type dis- 
tribution by the SDGVM have been extensively and successfully evaluated against 
a wide range of measurements, field observations and satellite products**”””. 
SDGVM simulated geographical distribution of plant functional types is in close 
agreement with maps of ‘actual’ vegetation’’’’. Global terrestrial net primary 
production for the contemporary climate and CO, is estimated with SDGVM to 
be 62 GtCyr ', inagreement with satellite-based estimate” of ~55-60 GtC yr’. 
Sensitivity of NPP predictions of SDGVM to CO, and climate are similar to those 
of other dynamic vegetation models*’*’. The CO, fertilization response of NPP 
compares favourably to that reported in Free Air Carbon Dioxide Enrichment 
(FACE) experiments for temperate forested sites”. 

Seasonal patterns of wetland CH, emissions from terrestrial ecosystems were 
computed with a process-based model describing the dependence of anaerobic 
microbial CH, production and aerobic oxidation on temperature, vegetation 
activity (gross primary productivity), soil respiration and soil water table depth'***. 
The CH, model is coupled to SDGVM and HadCM3 on a monthly time-step. 
Wetland CH, emissions were modified to include the effects of orography by 
scaling with a linear function of sub-grid orographic variance'*. The coupled 
wetland model produces regional and latitudinally averaged CH, fluxes similar 


to those of ref. 28, and have been validated against observations. Biomass burning 
from a fire module within SDGVM contributes to total CH, emissions. This 
assumes 80% of the above-ground carbon and nitrogen are lost by fire when the 
litter content reaches critical dryness'*. CH, emissions from termites and oceans 
are assumed to be the same of those in the pre-industrial but with altered distribu- 
tions according the land-sea areas. These procedures probably mean that the 
contribution of biogenic CH, fluxes from these sources are conservative. 

The wetland model produces monthly trace gas VOC emissions (isoprene, mono- 
terpeneand other VOCs) usinga global scheme”, modified to account for monthly LAI 
changes'®. Soil biogenic NO, fluxes are calculated with an empirical model* describing 
their dependence on temperature, precipitation, and canopy deposition. Lightning 
NO, emissions were calculated based on the convective precipitation amount. To 
account for recent, although uncertain, evidence of a CO, dependence of the emission 
rate of isoprene“*, we have used a function to describe this which was estimated from 
empirical measurements in figure 6 of ref. 20, to modify global isoprene emissions. 
Methane concentration calculations. In the interests of feasible computation 
expense, atmospheric methane concentrations were calculated by a first order 
approximation which used the sensitivity experiments from a previous modelling 
study'® to estimate the sensitivity of global methane concentrations to isoprene/ 
monoterpene (which influence the concentrations of the methane sink, hydroxyl 
radicals) and methane emissions. This study used the atmospheric-only version of 
the Hadley Centre model (HADAM3”*) to model climate for pre-industrial and 
LGM. SDGVM was used to reconstruct vegetation, wetland methane emissions, 
biomass burning and VOCs, as well as other related variables, and the atmospheric 
chemistry model STOCHEM”* was used to reconstruct atmospheric methane con- 
centrations. Subsequently, the chemistry model was run with LGM conditions 
except for pre-industrial methane emissions or isoprene/monoterpene emissions, 
or pre-industrial climate to assess the sensitivity of atmospheric methane concentra- 
tions and lifetime to each factor in turn. Here we use these sensitivities and assume 
linear relationships between methane concentrations and emissions of methane and 
isoprene. Atmospheric methane concentrations are calculated as in equation (1): 


CH,(f) ~ CH, (PI) + A(CH4_E(PI) — CH,_E(t)) + BUISO(PD — ISO(t)) (1) 


Here CH,(t) is global methane concentration at time t (kyr BP); CH4_E(t) is global 
methane emission rate; ISO(t) is isoprene emission rate; sensitivity of methane 
concentrations to methane emissions is A = 3.17 p.p.b.v. per Tg C yr, sensitivity 


of methane concentrations to isoprene emissions is B = 0.26 p.p.b.v. per Tg yr. 
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Alternative stable states explain unpredictable 
biological control of Salvinia molesta in Kakadu 


Shon S. Schooler', Buck Salau’, Mic H. Julien! & Anthony R. Ives® 


Suppression of the invasive plant Salvinia molesta by the salvinia 
weevil is an iconic example of successful biological control. 
However, in the billabongs (oxbow lakes) of Kakadu National 
Park, Australia, control is fitful and incomplete. By fitting a process- 
based nonlinear model to thirteen-year data sets from four billa- 
bongs, here we show that incomplete control can be explained by 
alternative stable states’ *—one state in which salvinia is suppressed 
and the other in which salvinia escapes weevil control. The shifts 
between states are associated with annual flooding events. In some 
years, high water flow reduces weevil populations, allowing the shift 
from a controlled to an uncontrolled state; in other years, benign 
conditions for weevils promote the return shift to the controlled 
state. In most described ecological examples, transitions between 
alternative stable states are relatively rare, facilitated by slow-moving 
environmental changes, such as accumulated nutrient loading 
or climate change*®. The billabongs of Kakadu give a different 
manifestation of alternative stable states that generate complex 
and seemingly unpredictable dynamics. Because shifts between 
alternative stable states are stochastic, they present a potential 
management strategy to maximize effective biological control: when 
the domain of attraction to the state of salvinia control is 
approached, augmentation of the weevil population or reduction 
of the salvinia biomass may allow the lower state to trap the system. 

Awareness and concern about the ecological consequences of alterna- 
tive stable states is growing as more examples have been identified'*””. 
In many examples, the alternative states are very stable, so that in the 
absence of an extraordinary perturbation, the system remains at its 
‘natural’ state. Concern arises because states can change abruptly even 
when the environmental drivers responsible for the change occur 
gradually; if the ecological system is perturbed by a slow-moving 
driver, the system may remain largely unchanged until it reaches a 
threshold catastrophe and abruptly shifts to another state’. At present, 
there is a theoretical enterprise to identify the early warning signs of 
these abrupt shifts’®'’. Equally disturbing, once the shift occurs the 
system will show hysteresis’; even if the environmental perturbation 
were reversed, the system would stay at its new state, inhibiting the 
ability of managers to repair the system to its desired state’”. 

Ecological systems, however, are subjected to stochastic and cyclic 
perturbations, and if alternative states are weakly stable and perturba- 
tions are large enough, then shifts between states may be routine’”’. 
Alternative stable states may thus generate underlying forces that govern 
the stochastic dynamics of the system, leading to complex and seemingly 
irregular, eruptive behaviour. In fact, it may be difficult to identify the 
alternative stable states, yet at the same time be difficult to understand 
the dynamics of the system without first identifying that alternative 
stable states exist. 

Salvinia molesta, a South American aquatic plant, is one of the most 
widespread and environmentally, economically and socially destructive 
invasive plant species. Since 1939, it has invaded lake and river systems in 
tropical and subtropical habitats around the world™. Its success is owing 
to its ability to double in biomass every 3-4 days, and to regenerate 


vegetatively even after severe damage or drying’*'®. It is capable of 
forming dense mats up to 1-m thick that make waterways unnavigable 
and displace aquatic organisms’*. Nonetheless, highly successful bio- 
logical control is often provided by the salvinia weevil (Cyrtobagous 
salviniae, Curculionidae) that since 1980 has been introduced into most 
regions where salvinia has invaded’’. The salvinia weevil is a strict 
specialist on salvinia; adults feed on growing meristematic tissue (buds), 
whereas larvae tunnel through vascular tissues’*, which together often 
lead to marked reductions in salvinia with no additional expenditure of 
resources '*"”, 

Salvinia invaded Kakadu National Park in 1983'°, and the salvinia 
weevil was released later that year'*. Although the weevils rapidly 
established and successfully controlled salvinia for several years, in 
1988-1990 salvinia resurged to form thick mats. This led to an intensive 
research project conducted by the Australian Commonwealth Scientific 
and Industrial Research Organization (CSIRO) in Kakadu’* and the 
establishment of a long-term monthly sampling program (Fig. 1). 
Unlike lake systems that experience continuously successful biological 
control of salvinia, the Kakadu billabongs are subjected to annual flood- 
ing that flushes salvinia downstream, mixing it among billabongs within 
the same floodplain. Floods also translocate salvinia to and from the 
billabongs and moist terrestrial sites; salvinia persists in these terrestrial 
sites during the dry season, sometimes at high biomass, where it has a 
refuge from the strictly aquatic weevils. Salvinia also occurs in the 
understory of grasses that grow over water along billabong edges where 
it is partially protected from both flooding and weevils’*. Thus, whereas 
the billabongs of Kakadu are highly perturbed, salvinia has refuges 
against both flooding and weevils from which it can reinvade open 
water. 

Fieldwork during 1991-1994 led to the following hypothesis for the 
failure of continuous biological control. When salvinia is at low density, 
it has relatively high nitrogen content and high growth rate. Salvinia is 
thus highly susceptible to biological control, and the many developing 
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Figure 1 | Log biomass of salvinia in four Kakadu billabongs. Grey line 
shows data for Jabiluka, dashed line for Minggung, black line for Jaja and dotted 
line for Island. The vertical axis is scaled to have mean zero across all billabongs. 
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Figure 2 | Fitted model to log salvinia biomass (black) and logit weevil 
damage (grey) for four billabongs. a—d, Dots give the raw data, and lines give 
the updated values from the Kalman filter (Box 1). Data and fitted model are 
standardized to have mean zero across all billabongs. The log water flow 


buds support high weevil numbers'’. Conversely, when salvinia is at 
high density, growth rates are relatively low, and new buds are scarce. In 
this condition much of the plant biomass occurs as vegetative, non- 
meristematic tissue often piled high above the water surface. This 
salvinia is much less suitable for weevils and is thus difficult to control. 
Although not stated as such, this is a hypothesis of alternate stable states 
determined by the growth state of salvinia: a low biomass state main- 
tained by weevil attack, and a high biomass state that has escaped weevil 
control. As found in other cases of alternative stable states, the salvinia— 
weevil system involves a species that has distinct life forms’”® and a 
herbivore that can potentially lose control of its food plant*!””. 
Standard analyses of the time-series data from the four billabongs 
reveal none of the dynamical hallmarks of successful biological control: 
there is little evidence that weevil damage is associated with declines in 
salvinia biomass (Supplementary Information). Further, changes in 
salvinia biomass between monthly samples are characterized by fre- 
quent small steps and occasional large jumps, indicating that nonlinear 
processes are driving salvinia dynamics (Supplementary Information). 
Therefore, we built a nonlinear model from what we know about the 
biology of the system (Box 1). Briefly, the model assumes that salvinia 
biomass is divided into two categories, one with copious buds that is 
vulnerable to weevils and the other that is not, with the proportion 
represented by these two categories depending on total salvinia bio- 
mass'*. The growth of salvinia biomass depends on the amount of 
vulnerable tissue. Weevils attack this vulnerable tissue non-uniformly, 
so that attacks can be aggregated among buds. The population growth 
rate of the weevils depends on the amount of salvinia biomass damaged. 
There is net migration of salvinia into billabongs from external areas or 
the grass understory along billabong edges, and mortality/flushing of 
both salvinia and weevils that depends on the water flow through the 
drainages. There is also stochastic variation that affects the per capita 
growth rates of both salvinia and weevils, and this variation can increase 
with increasing water flow. The latter property accounts for possible 
increases in unpredictable flushing or filling of billabongs with salvinia 
or weevils during flood events. The model fits the data well, with most 
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measured at monitoring stations in each drainage is given by the line at the 
bottom of the figure and is standardized to have mean zero and variance one 
(note different axis). 


parameters showing statistically strong effects on the observed 
dynamics (Fig. 2 and Supplementary Information). 

Key biological insights from the fitted parameter values of the model 
include the following (Table 1). The proportion of salvinia biomass in 
the category that is invulnerable to weevil attack, g, is generally high, 
ranging from ginin = 0.91 to Zmax = 0.94, which is consistent with weevils 
attacking meristematic tissues (adults) and vascular tissue (larvae). 
Flooding events reduce the abundance of weevils (d,, < 0), yet have 
little net effect on the mean abundance of salvinia (d, = 0). However, 
flooding events increase the variability in sample-to-sample fluctuations 
in salvinia biomass (0,2 > 0; see Table 1). These two patterns could be 
caused by flooding events moving salvinia among billabongs, between 
billabongs and surrounding land, and from the grass understory into 
open water; this would simultaneously increase the variance in salvinia 


Table 1 | Estimates from the best-AIC fitting model of the biologically 
relevant parameters 


Parameter Value Description 
min 0.91 Minimum weevil-invulnerable tissue 
Snax 0.94 Maximum weevil-invulnerable tissue 
by 5.41 Salvinia morphology inflection point 
2 385.7 Slope of morphology change at b; 
a 0.09 Weevil attack rate 
k 3.38 Aggregation parameter for weevils 
Cc 0.94 Weevil reproduction scale parameter 
m 0.0085 Salvinia net immigration 
Vv 0.79* Salvinia self-regulation form parameter 
ds of Change in salvinia with water flow, 6,(z; = ds Z¢ 
dw -0.24 Change in weevils with water flow, dw(Zi) = dw Zt 
Os. 0.21 Salvinia process error, var.{es(z)} = (o51 EXP(os2 Z)) ty 
O52 0.35 Water flow effect on salvinia process error 
Owl 0.42 Weevil process error, var{ew(Z)} = (ow1 EXP(awo Zp) ‘te 
Ow2 -0.08* Water flow effect on weevil process error 
AIC, Akaike information criterion. The model without alternative stable states (Simin = Smax 1 = b2 = 0) 
has a AAIC of 48.76 and provides a statistically significantly inferior fit to the data (73° = 52.76, 
P<0.001). For the model, R® = 0.73 and 0.38 for log salvinia biomass and logit weevil damage, 
respectively. All values are statistically significant by likelihood ratio tests except as follows. *Not 


different from 1 (AAIC = 1.08). ‘Not different from 0.01 (AAIC = 1.82). #Not different from O 
(AAIC = 1.86). 


3 FEBRUARY 2011 | VOL 470 | NATURE | 87 


©2011 Macmillan Publishers Limited. All rights reserved 


LETTER 


biomass and cause a net reduction of weevil abundance as salvinia 
colonizes billabongs from weevil-free refuges. 

Existence of the alternative states requires the proportion of salvinia 
biomass in the category vulnerable to weevil attack to change with 
salvinia biomass; if the model is constrained so that this proportion 
does not change (gmin = Zmax), then alternative stable states are 
impossible, and the fit of the model is statistically significantly reduced 
(likelihood ratio test, 737 = 52.76, P< 0.001). This provides strong 
support for the existence of alternative stable states. In three of the 
billabongs, the system spends time in the domains of attraction to both 
stable states, whereas Minggung has generally high salvinia abundance 
and rarely occupies the domain of weevil control (Fig. 3). We fit the 
model assuming that the underlying processes were identical across all 
billabongs. The poor salvinia control in Minggung could be due simply 
to the stochastic nature of the dynamics, by chance never staying long 
in the region of weevil control. There may be other differences between 
Minggung and other billabongs that we cannot identify, although such 
differences are not required to explain the observed dynamics. 

Owing to the annual flooding events and high stochasticity in the 
system, the fit of the model relies not only on the existence of alterna- 
tive stable states, but also the transient dynamics of the corresponding 
deterministic model. Although the deterministic model gives alterna- 
tive stable states when the logarithm of water flow is fixed at its mean 
value, when the log water flow is fixed at one standard deviation below 
its mean, there is only a single stable point, and when fixed at one 
standard deviation above its mean, the weevil is eliminated from the 
system (Supplementary Information). Therefore, as the water flow 
regime fluctuates through its annual cycle, the alternative stable points 
alternately disappear; a possible, although still incomplete, descrip- 
tion is that there are two alternative, environmentally forced 
cycles that have separate domains of attraction (Supplementary 
Fig. 8). Nonetheless, the ‘ghost’ of the boundary between stable 
points still affects the transient dynamics of the system®”***. An added 


logit estimated weevil damage 


Minggung 
~10 1 1 1 1 1 1 


-6 -2 0 2 4 


log estimated salvinia biomass 


Figure 3 | Phase portraits of logit weevil damage against log salvinia 
biomass for the four billabongs. a—d, Values have been standardized to have 
mean zero across all billabongs. Plotted trajectories (black dots and lines) are 
updated values fitted to the data, thereby smoothing the data to account for 
measurement error (Box 1). The dashed grey line divides the domains of 
attraction to the two stable points, marked with grey dots, and example 
deterministic trajectories are included (solid grey lines) when the drainage 
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complexity is that the dynamical forces around the two alternative 
stable states differ. In the domain of attraction to the state with high 
salvinia biomass, changes in biomass are slow compared to changes in 
weevil damage, as illustrated by the deterministic trajectories of the 
model (Fig. 3). This allows salvinia biomass to dynamically wander 
between high and moderate values. The domain of attraction to the 
lower stable state contains trajectories that tend to approach the stable 
point through increases in both salvinia biomass and weevil damage, 
causing a positive correlation between these two variables. All of these 
dynamical patterns contribute to the strength of fit of the model to the 
data. 

Our analyses point to possible opportunities to foster biological 
control of salvinia at Kakadu. For many examples of ecological systems 
with alternative stable states, the stability of the states makes transi- 
tions between them ecologically difficult and operationally challenging 
from a management perspective*”*. The Kakadu billabongs, however, 
are highly stochastic and experience periodic flooding, giving a window 
of opportunity to shift the system between states**”°. Even for Minggung, 
where biological control has been ineffective, the system occasionally 
jumps into the domain of attraction to the lower state of salvinia 
control. This occurs towards the end of the dry season after weevil 
populations recover from depression during flooding. If weevil control 
could be augmented at this time by inoculating Minggung with infested 
salvinia from other billabongs, the system could be captured in the 
lower domain of attraction. An alternative strategy would be to 
chemically or mechanically reduce salvinia as it is recovering from a 
flushing event, thereby allowing weevils more time to exert control. 
There is no guarantee that these strategies would work the first, the 
second, or even the third try. However, our theoretical demonstration 
that this lower state probably exists for Minggung should give hope for 
repeated attempts. Although alternative states are generally thought to 
present severe management challenges, when they are identified and 
understood, they may also present management solutions*””. 
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water flow is fixed at its mean value, the time step between samples is assumed 
to be t = 8.83 days, and there is no stochasticity in the model. The grey cross 
gives the equilibrium abundance of salvinia when there is no stochasticity and 
the log water flow is fixed at one standard deviation below its mean value. The 
grey arrow gives the deterministic abundance of salvinia in the absence of 
weevils which occurs when the flow rate is one standard deviation above its 
mean value (Supplementary Information). 
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BOX |: 
Salvinia—weevil model 


We fit a single model to the data simultaneously for all four billabongs, 
thereby assuming that their dynamics are governed by the same 
processes. The model has a nonlinear state-space form, with one set 
of equations describing the biological processes driving the dynamics 
and the other describing the sampling used to generate the data. Both 
process and measurement equations contain stochastic elements, 
with process error encapsulating environmental variation and 
measurement error describing any deviations between the ‘true’ state 
of the process variables and the data. 

The process equations are: 


Xeo1 = { [Xeg(xt) +Xe(1 —g(%r) )e™ (1 + (1 — g(x) x1) (1 — pr) 
exp(ds(Ze)) +t} EXP(és(Ze)) 
Yer ={exe(1 —g(xe))e™* pe Exp(Sw(Ze)) } ExP(ew(Ze)) 
where xand y;are the salvinia biomass and weevil abundance attimet, 


—k 
att F u : 
1 14 is the proportion of susceptible 
‘ ( ATs) ae 7 


salvinia attacked by weevils assuming that the distribution of attacks is 
given by a negative binomial with aggregation parameter k, and tis the 
time interval between model iterations (averaging 8.83 days). The 
function g(%) gives the proportion of salvinia biomass in the category 
invulnerable to weevil damage. g(x;) follows an inverse-logit function 
with minimum and maximum values 8inin ANG Smax, inflection point 
where X;= 6, and slope at the inflection point given by bs. The input 
variable z; is the log water flow measured at monitoring stations in each 
drainage, and 6,(z; and 6,,(z;) give the response of salvinia and weevil 
survival rates dependent on water flow. 

The measurement equations are: 


X;" =X; + log(g(e*)) +C+as 
W,* =logit(pt) + ow 


where X;= log % is the ‘true’ log salvinia biomass from the process 
equations, X;* is the observed log salvinia biomass assuming that only 
the invulnerable category is sufficiently dense to occur in visual 
sampling, and W,* is the logit of the observed weevil damage. The 
constant C is an overall scaling term for salvinia biomass, because the 
process equations are non-dimensional. The random variables «, and 
ot give measurement error, with ¢, assumed to have a Gaussian 
distribution with mean O and variance o7ms, and xy assumed to have a 

1 . 
pe(1 —pr)’ i 
mw = 0, this would be the variance undera binomial distribution with 
a sample size of n, but to allow greater-than-binomial distribution, we 
also estimated 7 mw. 

The model was fit using an extended Kalman filter to estimate the 
likelihood function?®. The Kalman filter is an iterative algorithm in which 
the ‘true’ population sizes and estimates of their uncertainties are 
projected forward using the model. When an iteration coincides with a 
sample point, the true population sizes are updated using the observed 
values and the estimates ofthe measurementerror; ifthe measurement 
erroris small,then updating pulls values closer to their observed values. 
The model values plotted in Figs 2 and 3 are these updated values. The 
parameter r, the intrinsic rate of increase of the salvinia, was set at 0.08 
per day as determined by extensive experiments at Kakadu (M.HJ., 
unpublished data). Backwards model selection was performed to find 
the best-fitting model, and the best-fitting model was confirmed with 
forward selection. Likelihood ratio tests were used to assess the 
statistical significance of key variables. Model parameters are described 
in Table 1, and detailed descriptions ofthe model and fitting procedure 
are given in the Supplementary Information. 


quasi-binomial distribution with variance (- +0 
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A thymus candidate in lampreys 


Baubak Bajoghli’, Peng Guo’, Narges Aghaallaei', Masayuki Hirano”, Christine Strohmeier', Nathanael McCurley”®, 
Dale E. Bockman’, Michael Schorpp', Max D. Cooper”? & Thomas Boehm! 


Immunologists and evolutionary biologists have been debating the 
nature of the immune system of jawless vertebrates—lampreys and 
hagfish—since the nineteenth century. In the past 50 years, these fish 
were shown to have antibody-like responses and the capacity to reject 
allografts’ but were found to lack the immunoglobulin-based adap- 
tive immune system of jawed vertebrates”. Recent work has shown 
that lampreys have lymphocytes that instead express somatically 
diversified antigen receptors that contain leucine-rich-repeats, 
termed variable lymphocyte receptors (VLRs)**, and that the type 
of VLR expressed is specific to the lymphocyte lineage: T-like lym- 
phocytes express type A VLR (VLRA) genes, and B-like lymphocytes 
express VLRB genes’. These clonally diverse anticipatory antigen 
receptors are assembled from incomplete genomic fragments by 
gene conversion®’, which is thought to be initiated by either of 
two genes encoding cytosine deaminase’, cytosine deaminase 1 
(CDA1I) in T-like cells and CDA2 in B-like cells’. It is unknown 
whether jawless fish, like jawed vertebrates, have dedicated primary 
lymphoid organs, such as the thymus, where the development and 
selection of lymphocytes takes place’*"’. Here we identify discrete 
thymus-like lympho-epithelial structures, termed thymoids, in the 
tips of the gill filaments and the neighbouring secondary lamellae 
(both within the gill basket) of lamprey larvae. Only in the thymoids 
was expression of the orthologue of the gene encoding forkhead box 
N1 (FOXN1)"°, a marker of the thymopoietic microenvironment in 
jawed. vertebrates’, accompanied by expression of CDAI and 
VLRA. This expression pattern was unaffected by immunization 
of lampreys or by stimulation with a T-cell mitogen. Non-functional 
VLRA gene assemblies were found frequently in the thymoids but 
not elsewhere, further implicating the thymoid as the site of develop- 
ment of T-like cells in lampreys. These findings suggest that the 
similarities underlying the dual nature of the adaptive immune 
systems in the two sister groups of vertebrates extend to primary 
lymphoid organs. 

The long-standing question of whether lampreys have a thymus led 
to the description of several circumscribed lymphoid accumulations in 
the gill basket of lamprey larvae as candidates for the thymus, for 
instance in the lateral branchial wall sinuses’* and the ‘thymic placode’ 
inside the epipharyngeal folds’* (Supplementary Information and 
Supplementary Fig. 1). However, histological surveys failed to reveal 
a thymus analogue unambiguously. We were prompted to re-examine 
the issue of a thymus equivalent in lampreys by the recent identifica- 
tion of two separate lineages of lymphocytes in lampreys, a finding 
indicating that the dual nature of the immune system extends to all 
vertebrates’. T-like cells of lampreys were shown to express VLRA on 
their surface (denoted VLRA* cells), whereas B-like cells express sur- 
face and secreted forms of VLRB. Moreover, VLRA* cells were shown 
to express CDA1 preferentially, whereas VLRB™ cells express CDA2 
(ref. 5). With respect to a putative thymus primordium in jawless fish, 
the lamprey orthologue of FOXNI1, a marker of the thymopoietic 
microenvironment in jawed vertebrates'’’’, was found to be expressed 
in lamprey larvae in a region of the gill basket where VLRB-expressing 
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lymphocytes could not be found'®. On the basis of the cell-type 
specificity of gene expression in lamprey lymphocytes, we therefore 
hypothesized that co-expression of VLRA and CDA] or of VLRB and 
CDA2 would distinguish the primary lymphoid organs in which these 
T-like and B-like cells are generated in jawless vertebrates. 

Expression of the VLR genes was detectable by RNA in situ hybridi- 
zation in cells of many tissues of European brook lamprey (Lampetra 
planeri) larvae. Cells expressing VLRA and VLRB messenger RNAs 
were found in the gill filaments, kidneys, typhlosole and blood (Fig. 1a, 
b). VLRA-expressing cells dominated in the gill basket, whereas VLRB- 
expressing cells were more frequent in the kidneys and typhlosole 
(Supplementary Fig. 2). These data are in agreement with previous 
flow cytometric analyses of cells from the closely related sea lamprey 
(Petromyzon marinus)? and indicate a tissue-specific pattern of distri- 
bution for the two lymphocyte lineages. 

Expression of the CDA genes was more spatially restricted than that 
of the VLR genes. When assayed by in situ hybridization, CDA1 
mRNA was detected at discrete sites that were located primarily at 
the tip of gill filaments. CDA1 was not expressed by cells in the liver 
(not shown), kidneys, typhlosole, blood or other tissues outside the gill 
basket (Fig. 1c). Conversely, cells expressing CDA2 were preferentially 
expressed in the typhlosole, predominantly around the central blood 
vessel, and occasionally expressed in the kidneys and blood (Fig. 1d), 
indicating that the sites of CDA1 and CDA2 expression do not overlap. 
The patterns of co-expression of VLR and CDA genes in these distinct 
anatomical sites (Fig. le) suggest that VLRA* T-like cells might 
develop in the gill region, whereas VLRB* B-like cells might originate 
in haematopoietic tissue, namely in the typhlosole and kidneys. This is 
reminiscent of the situation in jawed vertebrates, in which T cells 
develop in the thymus and B cells develop primarily in the bone 
marrow or its functional equivalents. 

The thymus in jawed vertebrates is uniquely identifiable by thymo- 
cyte expression of recombination-activating genes (RAGs) during 
rearrangement of the genes that encode the T-cell antigen receptor, 
as well as by epithelial cell expression of the gene encoding the tran- 
scription factor FOXN1. It has previously been shown that the lamprey 
orthologue of the thymic epithelial marker FOXN1, designated FOXN1 
(or FOXN4L), is expressed in the epithelium of the gill basket of 
lamprey larvae'’. To explore the possibility that a thymus-like struc- 
ture might be found in the gill basket of lamprey larvae, we searched for 
sites at which markers of thymic epithelial cells (FOXNI and Delta-like 
B (DLL-B)) and of lymphocytes (VLRA and CDA1) were expressed 
together. All four genes were found to be expressed in the same region 
of the gill filaments (Fig. 2a). Double in situ hybridization confirmed 
the association of VLRA and CDA1 expression in the thymoids and 
indicated that some VLRA-expressing cells also express CDA1. By 
contrast, VLRB-expressing cells do not express CDA1 (Fig. 2b) and 
are absent from the thymoid region (Figs 1b and 2b). To examine the 
micro-anatomical relationship between epithelial and lymphocyte 
markers, we used a DLL-B probe that specifically detects the expression 
of a putative FOXNI1 target gene encoding a Delta-related Notch 
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Figure 1 | Tissue-specific expression of VLR and CDA genes in L. planeri 
larvae. a-d, Gene-specific expression, as determined by RNA in situ 
hybridization with gene-specific riboprobes, is shown in blue. a, VLRA 
expression. b, VLRB expression. c, CDA1 expression. d, CDA2 expression. 

e, Histological structure. Haematoxylin and eosin (H&E) staining of sections 


ligand", as a marker of epithelial cells, and CDA1, as a marker of 
developing T-like cells. In the thymoid, these cells are located in close 
proximity (Fig. 2c). Analysis by immunofluorescence confirmed the 
presence of VLRA-producing cells and the absence of VLRB- 
producing cells in the thymoid, whereas both cell types were found 
in blood vessels (Fig. 2d). Histological analysis by light microscopy 
showed that the thymoid contains both epithelial cells and lympho- 
cytes (Fig. 2e). Electron micrographs confirmed that these cells are 
present in close proximity in the thymoid (Fig. 2f) and indicate that 
phagocytosis of cellular debris occurs at this site (Fig. 2g). Only a small 
subset of CDA1-expressing cells proliferate (Supplementary Fig. 3), 
suggesting cell-cycle-specific regulation of this lamprey gene belonging 
to the AID-APOBEC family. Cells expressing CDA1 were found in the 
tips of the gill filaments encompassing the entire gill basket (Fig. 2h 
and Supplementary Fig. 4), indicating that the thymoids are not con- 
fined to a specific pharyngeal arch. These findings in L. planeri were 
confirmed for P. marinus (Supplementary Fig. 5), indicating that 
the specific gene expression patterns are a general characteristic of 


equivalent to those used for RNA in situ hybridization. The intestinal contents 
(*) (primarily yeast) cause non-specific staining in a-d and are also seen in 

e (#). BV, blood vessel; IW, intestinal wall; KT, kidney tubule; T, typhlosole. 
Scale bars, 100 pm. 


lampreys. We conclude that the lympho-epithelial structure identified 
in the gill basket bears the diagnostic hallmarks expected for a thymus 
equivalent: namely expression of CDA1 (a gene that is predicted to be 
associated with somatic diversification of the VLRA locus) together 
with VLRA expression, and the concordant expression of FOXN1 with 
DLL-B (one of the key target genes of FOXN1). 

In jawed vertebrates, the thymus is a primary lymphoid organ, 
which, unlike secondary lymphoid tissues such as lymph nodes and 
the spleen, does not change its structure during an immune response. 
We applied this criterion to the thymoid. Lamprey larvae mount 
strong immune responses against various antigens, including the exo- 
sporium of Bacillus anthracis*. Immunization with this antigen 
(Fig. 3a) leads to a general proliferative response in haematopoietic 
tissues’. There is a massive increase in cell proliferation in the typh- 
losole and kidneys, whereas the increase in the number of proliferating 
cells in the gill vasculature is more modest (Fig. 3b). Concomitantly, 
the number of VLRA-expressing cells increases in the typhlosole, and 
the number of VLRB-expressing cells increases in the blood vessels, 
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being particularly noticeable in the cavernous bodies located at the 
base of gill filaments (Supplementary Fig. 6). After intracoelomic injec- 
tion of phytohaemagglutinin (PHA) (Fig. 3a), this mitogen is distri- 
buted throughout the vasculature and rapidly accumulates in the gill 
region (Supplementary Fig. 7). PHA is known to cause a proliferative 
burst of VLRA® cells’, and this general proliferative effect is clearly 
seen by using in situ analysis in the gill region, kidneys and typhlosole 
(Fig. 3b). Stimulation with PHA also increased the number of VLRA- 
expressing cells in the gill region, kidneys and typhlosole (Supplemen- 
tary Fig. 6). Importantly, however, PHA does not induce proliferation 
of cells in the thymoid tips (Fig. 3b), and it does not change the pattern 
of CDA expression (Fig. 3c). The finding that cells in the thymoids fail 
to respond to antigenic or mitogenic stimuli is compatible with the 
thymoid being a primary lymphoid organ rather than a secondary one. 

We sought more direct evidence that the thymoid is the site at which 
T-like cells develop. We examined the status of VLRA gene assembly in 
this region and elsewhere. Previously, it has been shown that VLRA~ 
lymphocytes typically assemble only one VLRA allele’®, the other allele 
being retained in the germ-line configuration. Furthermore, the 
assembled VLRA allele is almost always functional. Non-functional 
rearrangements of VLRA genes are exceedingly rare and are always 
accompanied by a functional assembly on the second allele’®. This 
observation suggests a process that ensures the selective development 
of VLRA* lymphocytes. We used laser-capture microdissection to 
procure genomic DNA from cells at the thymoid tip of the gill filament 
and, as a control, from within the blood vessels at the filament base 
(Fig. 4a). As anticipated, assembled VLRA genes could be amplified 
from the DNA isolated from both the thymoid and the blood. 
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Figure 2 | Characterization of the lamprey thymoid. a, Concomitant 
expression of epithelial markers (FOXN1 and DLL-B) and two lymphocyte- 
specific genes (CDA1 and VLRA) at the tip of gill filaments, as determined by 
RNA in situ hybridization with gene-specific riboprobes. Scale bars, 50 jim. 
b, Co-expression of VLRA (blue) and CDA] (red) in a subset of VLRA- 
expressing lymphocytes (upper panels), as determined by RNA in situ 
hybridization with gene-specific riboprobes on adjacent sections, which are 
shown separately and as a merged image. Co-expression seems to occur 
preferentially in cells expressing low levels of VLRA (arrows indicate co- 
expressing cells). VLRB (blue) and CDA1 are not co-expressed (lower panels). 
Scale bars, 40 j1m. Greater magnifications are shown to the right (scale bars, 
5 |um) with quantitative assessments of gene expression presented underneath 
(numbers indicate number of cells that express one or both genes). ¢, Close 
proximity of cells expressing DLL-B (blue) and CDA1 (red) in the thymoid. 
RNA in situ hybridization was carried out with gene-specific locked-nucleic- 
acid-enhanced oligonucleotides on adjacent sections, which are shown 
separately and as a merged image. For the CDA1 panel, the images of the DNA 
counterstain (4’,6-diamidino-2-phenylindole (DAPI), blue) and the 
differential interference contrast are superimposed. Scale bar, 10 um. The 
rightmost panel shows a three-dimensional reconstruction of thymoid tissue 
with DLL-B- and CDA1-expressing cells indicated in different colours; other 
cells are shown only in outline. d, Immunofluorescence microscopy with anti- 
VLRA (green) and anti- VLRB (red) antibodies. Intravascular lymphocytes are 
indicated by arrows. Deposits of VLRB at the vessel walls are indicated by 
arrowheads. DNA is counterstained with DAPI (blue). Scale bar, 50 pm. 

e, Lympho-epithelial structure of the thymoid as revealed by light microscopy. 
Nucleated erythrocytes are visible in the large blood vessel (BV) underneath the 
thymoid. Scale bar, 30 um. f, Lympho-epithelial structure of the thymoid as 
revealed by transmission electron microscopy: lymphocytes (L), epithelial cells 
(E) and blood vessel (BV). The lumen of the gill chamber is at the top left. Scale 
bar, 2 um. g, Evidence for phagocytosis (P) in the thymoid. Scale bar, 2 jim. 
h, CDAI expression is found at the tips of gill filaments (GF) and in the 
epithelial linings of the epipharyngeal ridge (EP) and the hypopharyngeal fold 
(HP), as shown by RNA in situ hybridization with a gene-specific riboprobe on 
a transverse section (see also Supplementary Fig. 4). Scale bar, 300 jim. Sections 
are from L. planeri in a (FOXN1, CDAI and VLRA), e and h, and from P. 
marinus in all other panels. 


Strikingly, however, non-functional sequences were found only in 
the thymoid fraction, whereas the sequences obtained from cells iso- 
lated from blood vessels were all functional (Fig. 4b and Sup- 
plementary Fig. 8). This difference indicates that non-functional 
assemblies impair the further development of VLRA* lymphocytes. 
Indeed, within the gill basket, caspase-3-positive apoptotic cells were 
also found primarily in the gill filament tip and neighbouring second- 
ary lamellae, closely resembling the distribution of CDA1-expressing 
cells (Fig. 4c). These findings are compatible with the idea of selection 
against thymoid cells that are undergoing non-functional VLRA 
assembly. It will be interesting to examine whether this is also true 
for VLRC, which was described recently and is structurally similar to 
VLRA”. 

To evaluate the stringency of this selection process, we investigated 
whether non-functionally assembled VLRA genes could be identified 
in VLRA VLRB cells isolated from the blood and gills. Although it 
was possible to amplify assembled VLRA genes from VLRA VLRB~ 
cells with lymphocyte light-scattering features, two rounds of amp- 
lification were required to detect them (Fig. 4d). Essentially all of these 
relatively rare VLRA assemblies were non-functional, in contrast to the 
status of VLRA genes in VLRA*VLRB cells (Fig. 4e and Sup- 
plementary Fig. 9). Notably, VLRA assemblies were not found in 
VLRA VLRB" cells (Fig. 4d). Together, these findings indicate a 
remarkable efficiency of selection for VLRA™ cells. 

Here we describe a previously unrecognized lympho-epithelial struc- 
ture in the gill basket of lamprey larvae. The thymoid bears the hallmarks 
of a tissue where lymphocytes undergo VLRA assembly and selection for 
the expression of functional VLRA genes, which encode the anticipatory 
antigen receptor of T-like lymphocytes. Our findings suggest that this 
structure is a candidate for a thymus in lampreys. The dispersed nature 
and relatively inconspicuous morphological appearance of the lamprey 
thymoid structure could explain why it has gone unnoticed for so long 
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Figure 4 | Selection of VLRA-expressing lymphocytes in the thymoid. 

a, Micrographs showing the site of tissue procurement, by laser-capture 
microdissection (LCM), at the gill filament tip region of the thymoid. Scale bars, 
50 um. b, Proportion of non-productive sequences among assembled VLRA 
genes. The difference between the thymoid and blood is significant at P< 0.01 
(Fisher’s exact test) (thymoid, n = 29; blood, n = 18; n, number of sequences). 
ND, not detectable. c, Apoptotic cells (left), as detected with anti-caspase-3 
antibody (red), preferentially occur in the thymoid, as detected by CDA1 
expression (centre, blue). In the left panel, DNA is counterstained with DAPI 
(blue). In the centre panel, anatomical landmarks are indicated: tip of gill 
filament (t), first secondary lamella (1), second secondary lamella (2) and so on. 
A numerical analysis of the distributions of caspase-3-positive and CDA1- 
expressing cells relative to the relevant anatomical landmarks of gill filaments is 
shown in the right panel; randomly selected sections were used (caspase-3, 

n = 36; CDAI, n = 17; n, number of sections; error bars, s.e.m.). Scale bars, 
100 um. d, Assembled VLRA genes are detectable in VLRA VLRB_ (double 
negative, DN) cells separated by FACS after isolation from the blood or gills of 
lamprey larvae but not in VLRA” VLRB™ (B") cells. e, Non-productive 
assemblies of VLRA genes predominate in VLRA VLRB cells. By contrast, in 
VLRA*VLRB© cells, essentially all assembled VLRA genes are productive. 
VLRA VLRB cells, n = 42 for gills, n = 33 for blood; VLRA*VLRB’ cells, 
n = 33 for gills, n = 39 for blood; n, number of sequences. Samples are from L. 
planeri in a and b and P. marinus in c-e. 


Figure 3 | Stimulation does not affect cell proliferation and gene expression 
in the thymoid of P. marinus. a, Time points for injection with B. anthracis 
exosporium, the lectin PHA and the nucleoside analogue 5-ethynyl-2’- 
deoxyuridine (EdU). b, Representative sections revealing the number and 
location of proliferating cells (EdU, green), as determined by 
immunofluorescence microscopy. DNA is stained with Hoechst stain (blue). 
The tips of the gill filaments are shown at higher magnification in the upper 
panels. c, Distribution of CDA1-expressing cells in gill filaments, as determined 
by RNA in situ hybridization with a gene-specific riboprobe. Scale bars, 100 jim. 
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and why it was revealed only by extensive gene expression analyses. The 
identification of the thymoid in lampreys provides a starting point for 
more detailed comparative studies between jawless and jawed verte- 
brates, guided by the wealth of information about thymopoiesis in jawed 
vertebrates. Finally, these findings suggest that a common vertebrate 
ancestor had not only T-like and B-like lymphocyte lineages but also 
anatomically distinct tissues in which these cells could develop in a 
spatially segregated manner. 


METHODS SUMMARY 


Lampetra planeri larvae were sampled in the Freiburg area (Germany). 
Petromyzon marinus larvae were collected from tributaries to Lake Michigan 
(Michigan). RNA in situ hybridization analyses were performed on paraffin- 
embedded tissue sections. Specific antibodies were used to detect VLRA* and 
VLRB* lymphocytes in situ. Cells undergoing apoptosis were detected with 
caspase-3-specific antibodies, and proliferating cells were marked by incorpora- 
tion of 5-ethynyl-2'-deoxyuridine and detected by Click-iT reaction. Preparative 
flow cytometry was used to separate lymphocyte populations according to their 
VLRA and VLRB expression. PCR with specific primers was used to amplify germ- 
line and assembled forms of VLRA genes for subsequent cloning and sequence 
analysis. Laser-capture microdissection of paraffin-embedded tissue sections was 
used to procure genomic DNA from specific regions. Transmission electron 
microscopy was used for high-resolution tissue analysis. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Animals and immune stimulation. Lampetra planeri (5-12 cm long and 1-3 
years of age) were sampled in the Freiburg area (Germany) with the permission 
of the local authorities. Petromyzon marinus larvae (8-12 cm long and 2-4 years of 
age) were from Lake Michigan tributaries in Michigan (Lamprey Services). Fish 
were maintained in sand-lined aquaria at 18 °C and fed brewer’s yeast. Larvae were 
immunized with Bacillus anthracis exosporium or stimulated with phytohaemag- 
glutinin (PHA) (Sigma) as described previously*. Briefly, larvae were anaesthetized 
with 0.1 gl! ethyl 3-aminobenzoate methanesulphonate (MS-222) (Sigma) and 
were given 60 pl intracoelomic injections of B. anthracis exosporia (10g) or 
PHA-L (leukoagglutinin; 25,1g) prepared in sterile 0.67 PBS buffer. 
Immunization with exosporia was carried out on day 0 and 14, and tissues were 
collected after animals were killed by administering 1g] ’ MS-222 on day 28. 
PHA was administered as a single injection, and animals were killed on day 9 after 
injection. For proliferation assays, lampreys were injected with 5-ethynyl-2'-deox- 
yuridine (EdU) (5 1g) (Invitrogen) in 60 1 0.67 PBS and returned to aquaculture 
for 24h before being killed. 

Flow cytometry and cell sorting. Leukocytes were isolated from the blood and 
other tissues of P. marinus larvae before being separated by FACS as previously 
described’. Briefly, blood was collected in 0.67X PBS containing 30 mM EDTA, 
and buffy coats were prepared by centrifugation at 50g. Leukocytes were liberated 
from gills by mincing with frosted glass slides. Cells were stained with R110 anti- 
VLRA rabbit polyclonal serum and 4C4 anti- VLRB mouse monoclonal antibody, 
followed by R-phycoerythrin-conjugated goat anti-rabbit antibody and allophy- 
cocyanin-conjugated goat anti-mouse antibody (SouthernBiotech), and then were 
sorted into VLRA*, VLRB* and VLRA VLRB populations on a FACSAria II 
flow cytometer (Becton Dickinson). 

Genomic DNA and PCR. Genomic DNA was extracted from sorted cell popula- 
tions—VLRA*, VLRB* and VLRA VLRB cells in the lymphocyte gate of blood 
and gill samples from P. marinus—using a DNeasy kit (QIAGEN). First-round 
PCR was carried out with VLRA-F and VLRA-R primers (and Ex Taq polymerase, 
Takara). Reactions were amplified using the following: one cycle of 94°C for 
1 min; 35 cycles of 94°C for 15s, 56 °C for 20s and 72 °C for 1 min; and one cycle 
of 3 min at 72 °C. Second-round (nested) PCR was performed with VLRA-F2 and 
VLRA-R2 primers. Reactions were amplified using the following: one cycle of 
94°C for 1 min; 35 cycles of 94°C for 15s, 57°C for 20s and 72°C for 1 min; 
and one cycle of 3min at 72°C. PCR products were cloned into the pGEM-T 
vector (Promega). Paraffin-embedded sections (5-7 tm) of L. planeri were used 
for laser-capture microdissection as previously described'*. Genomic DNA extrac- 
tion was performed using a PicoPure DNA extraction kit (Arcturus). VLRA genes 
from L. planeri tissues were amplified by nested PCR, the first amplification using 
primers vlra_1f and vlra_2r, the second amplification using primers vlra_3f and 
vira_4r. Reactions were amplified using the following: first amplification, one cycle 
of 96 °C for 2 min; 35 cycles of 96 °C for 15 s, 60 °C for 30s and 72 °C for 2 min; and 
one cycle of 5 min at 72 °C; and second amplification, as above but for 25 cycles 
instead of 35. The assembled VLRA gene fragments were then cloned into the 
pGEM-T vector. Clones were sequenced using the ABI 3730x1 DNA Analyzer 
(Applied Biosystems). Primer sequences are listed in Supplementary Table 1. 
Immunofluorescence microscopy. For analysis of lymphocyte distribution in P. 
marinus larvae, dissected tissues were fixed for 12h in 0.67X PBS containing 2% 
paraformaldehyde at 4°C, cryopreserved in 30% sucrose, embedded in OCT 
compound (Tissue-Tek, Sakura) and sectioned at 7 |tm on a cryostat (Thermo). 
Sections were permeabilized in PBS containing 10% goat serum, 0.5% saponin, 
10 mM HEPES buffer and 10 mM glycine. They were stained for 1 h with R110 and 
4C4 antibodies, followed by 1h with the appropriate Alexa-Fluor-conjugated 
secondary antibodies (Invitrogen) before being mounted in ProLong Gold with 
4',6-diamidino-2-phenylindole (DAPI) solution (Invitrogen). Fluorescence 
microscopy was performed with an Axiovert 200M microscope (Zeiss), equipped 
with a X40 objective (numerical aperture, 0.6; ocular magnification, X10). 
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For detection of apoptotic cells, frozen sections (10-15 um) were fixed in 4% 
paraformaldehyde for 15 min, washed several times in PBS, and permeabilized 
with PBS containing 0.2% Triton X-100 detergent for 5 min, then washed again in 
PBS. To block non-specific binding sites, the sections were incubated for 1h in 5% 
normal goat serum diluted in blocking solution (PBS containing 0.1% Tween 20 
detergent). Sections were then incubated with rabbit anti-active caspase-3 (G7481, 
Promega; at a 1:250 dilution in blocking solution) for 16 h at 4 °C. After washing in 
PBS containing 0.1% Tween 20, a Cy3-conjugated donkey anti-rabbit antibody 
(1:500 in blocking solution) was applied for 30 min. Sections were washed several 
times in PBS and, after drying, were mounted in Fluoromount-G and 4’,6-diamidino- 
2-phenylindole (DAPI). Micrographs were taken using an Axio Imager.Z1 and 
ApoTome microscope (Zeiss). 

For analysis of cell proliferation, lampreys were embedded in OCT and cryosec- 

tioned. EdU detection was performed using the Click-iT EdU Alexa Fluor 488 Flow 
Cytometry Assay Kit (Invitrogen). Briefly, frozen sections (10-15 tm) were fixed 
with 4% paraformaldehyde for 15 min at 20 °C and washed in PBS. Sections were 
permeabilized with PBS containing 0.5% Triton X-100 and then washed in PBS. 
Sections were then incubated with the EdU reaction cocktail for 30 min at 20 °C. 
After several washes in PBS, sections were incubated in Hoechst stain (1:2,000 in 
PBS) for 30 min. Sections were washed several times in PBS and, after drying, were 
mounted with Fluoromount-G. For EdU detection in paraffin-embedded tissue, 
sections were deparaffinized and rehydrated, and then permeabilized and pro- 
cessed as above. Images were processed with the program Photoshop (Adobe). 
Electron microscopy. Killed P. marinus larvae were immersed in Karnovsky’s 
fixative (paraformaldehyde and glutaraldehyde) for 2h. Thin cross sections were 
cut from gill regions and prepared for transmission electron microscopic analysis 
as described previously’. Sections were examined with a JEM-1230 transmission 
electron microscope (JEOL). Images were recorded digitally using the UltraScan 
4000 imaging system (Gatan). 
In situ hybridization analysis. RNA in situ hybridization was performed 
with digoxigenin (DIG)-labelled RNA riboprobes as described previously’’. 
Probe sequences are listed in Supplementary Table 2. For DLL-B and CDA1 co- 
localization studies, in situ hybridization was carried out by use of short gene- 
specific locked-nucleic-acid (LNA)-enhanced oligonucleotides (Exiqon; see 
Supplementary Table 2) as described previously’®, with the exception that the 
temperature of hybridization was 54°C. Double in situ hybridization was carried 
out as follows. DIG- and fluorescein isothiocyanate (FITC)-labelled RNA 
antisense probes were hybridized to RNA in tissue sections simultaneously. The 
DIG-labelled probe was detected first, with an alkaline-phosphatase-conjugated 
anti-DIG antibody (1:2,000 dilution in maleic acid buffer (MAB); 100 mM maleic 
acid, pH 7.5, 150mM NaCl, 2mM Levamisol, 0.1% Tween 20 and 1% blocking 
reagent (Roche)). It was revealed by staining with BM Purple (Roche), according 
to the manufacturer’s instructions. The sections were washed several times with 
TNT solution (10 mM Tris-Cl, pH 7.5, 500 mM NaCl and 0.1% Tween 20), and the 
FITC-labelled probe was detected by a peroxidase-conjugated anti-FITC antibody 
(1:500 dilution in MAB) and revealed by Cy5 fluorescence, using Tyramide Signal 
Amplification Plus Cy5 system (Perkin Elmer). For three-dimensional reconstruc- 
tions, alternate (5 |tm) sections were hybridized with gene-specific LNA-enhanced 
oligonucleotides. After the staining reactions, the thymoid regions of the gill fila- 
ments were photographed using an Imager Z1 microscope (Zeiss). A series of TIF 
files indicating the positions of the CDA1- and DLL-B-expressing cells, and the 
boundaries of neighbouring cells, were generated using the program Corel DRAW 
Graphics Suite 11. Each series of images was aligned using DeltaViewer software 
(http://delta.math.sci.osaka-u.ac.jp/DeltaViewer/) to build the three-dimensional 
image. 
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Loss of kidney function underlies many renal diseases’. Mammals 
can partly repair their nephrons (the functional units of the kidney), 
but cannot form new ones*’. By contrast, fish add nephrons 
throughout their lifespan and regenerate nephrons de novo after 
injury*’, providing a model for understanding how mammalian 
renal regeneration may be therapeutically activated. Here we trace 
the source of new nephrons in the adult zebrafish to small cellular 
aggregates containing nephron progenitors. Transplantation of 
single aggregates comprising 10-30 cells is sufficient to engraft 
adults and generate multiple nephrons. Serial transplantation 
experiments to test self-renewal revealed that nephron progenitors 
are long-lived and possess significant replicative potential, consist- 
ent with stem-cell activity. Transplantation of mixed nephron pro- 
genitors tagged with either green or red fluorescent proteins yielded 
some mosaic nephrons, indicating that multiple nephron progeni- 
tors contribute to a single nephron. Consistent with this, live 
imaging of nephron formation in transparent larvae showed that 
nephrogenic aggregates form by the coalescence of multiple cells 
and then differentiate into nephrons. Taken together, these data 
demonstrate that the zebrafish kidney probably contains self- 
renewing nephron stem/progenitor cells. The identification of these 
cells paves the way to isolating or engineering the equivalent cells in 
mamunals and developing novel renal regenerative therapies. 

Zebrafish nephrons in the adult kidney are similar to those found in 
the embryonic kidney*, except that they are highly branched and 
drained by two central collecting ducts (Fig. la and Supplementary 
Fig. 2a—j). We confirmed that zebrafish nephron number increases with 
age (Fig. 1b), similar to other fish*®. To identify the source of new 
nephrons in adult zebrafish, we first characterized the effects of genta- 
micin injection, an established nephrotoxin’. Intraperitoneal injection 
of gentamicin induced nephron damage, downregulated the proximal 
tubule marker s/c20a1a and resulted in a failure to take up filtered 
40kDa fluorescent dextran® by 1day post-injection (Fig. 1c-f, 
n = 6/6; Fig. 1j-k, n = 8/8; Supplementary Fig. 2k—p). Around 4 days 
post-injection, partial restoration in nephron function was observed, 
suggesting some nephrons recovered from the injury (Fig. 1g, 1, arrow). 
At this stage we also detected small, but appropriately proportioned, 
nephrons that were dextran-positive, proliferating and basophilic, 
which are characteristic features of immature nephrons’ (approxi- 
mately 15 per kidney; Fig. 1i, ] inset, n). By 15 days post-injection the 
damaged nephrons had recovered to near-normal levels, although 
immature nephrons could still be detected (Fig. 1h, m, arrow). 

If the adult kidney contains nephron progenitors responsible for the 
formation of new nephrons, then these cells might be amenable to 
transplantation. To test this, we developed a transplantation assay 
(Fig. 2a and Supplementary Fig. 3a—-k) in which recipient fish were 


immunocompromised by radiation to prevent graft rejection’ and 
then injected with gentamicin. Unpurified whole-kidney marrow cells 
(WKM), mostly comprising non-tubular interstitial cells’, were pre- 
pared from Tg(cdh17:EGFP)"® or Tg(cdh17:mCherry) donors that 
express fluorescent reporters in the distal nephron. Injection of approxi- 
mately 5 X 10° of these cells resulted in donor-derived nephrons in 
100% of the recipients (n=6) by 18 days post-transplantation 
(d.p.t.), with an average of 24 donor-derived nephrons (Fig. 2b, arrow, 
inset). Donor nephron number increased with time, reaching an average 
of 70 nephrons by 59 d.p.t. (Fig. 2c) and greatly expanded the head 
kidney on the injected side (Fig. 2d, arrow). At these later time points, we 
also found donor-derived nephrons in locations distant from the site of 
injection, which suggests that the transplanted cells are migratory 
(Supplementary Fig. 31, arrowheads). 

To regenerate damaged tissue successfully, newly created structures 
must incorporate into existing tissue. To determine whether the donor- 
derived nephrons were capable of blood filtration, we injected 40 kDa 
fluorescent dextran into transplant recipients that had received WKM 
from Tg(cdh17:mCherry) donor fish and dissected out individual 
nephrons. All of the donor-derived nephrons examined (n = 5) were 
dextran-positive (Fig. 2e), indicating that they had integrated into the 
recipient’s blood supply. These results show that nephron progenitors 
are present in the adult kidney and that after transplantation they are 
capable of forming new functional nephrons within the host’s renal 
tissue. 

Cell transplantation experiments can be confounded by the fusion 
of donor and recipient cells. To address this, we injected WKM from 
Tg(cdh17:mCherry) donors into Tg(cdh17:EGFP) recipients. If fusion 
occurred, we would expect to find nephrons positive for both mCherry 
and enhanced green fluorescent protein (EGFP). An analysis of 
engrafted recipients (n = 6) revealed that all of the mCherry-positive 
nephrons were EGFP-negative, providing evidence that they had not 
formed by cell-cell fusion. In addition, we identified the connection of 
the donor-derived nephrons with the host’s renal tubules, providing 
further evidence that the engrafted nephrons had successfully inte- 
grated into the recipient’s renal system (Fig. 2f). 

Lineage labelling studies in the developing mouse kidney have 
revealed that multiple Six2* cap mesenchyme cells, the source of 
nephron progenitors, contribute to a single nephron"’. To explore this 
in zebrafish, we transplanted a 1:1 mix of Tg(cdh17:EGFP) and 
Tg(cdh17:mCherry) WKM cells into conditioned recipients. Mosaic 
nephrons containing both EGFP-positive and mCherry-positive cells 
were found in 27% of the engrafted fish (n = 15; Fig. 2g), although the 
remaining nephrons were either all EGFP-positive or all mCherry- 
positive. Thus multiple nephron progenitors can contribute to an indi- 
vidual nephron. 
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Figure 1 | The adult zebrafish kidney undergoes nephrogenesis throughout 
life and after injury. a, Zebrafish kidney and nephron model (scale bar, 1 mm). 
b, Graph showing average number of renal corpuscles (RC) relative to body 
length (inset shows an RC labelled with dn-fgfr1-EGFP"*) (error bar, one 
standard deviation; n = 3 fish per time point). RC, renal corpuscle; N, neck; PI, 
proximal tubule I; PII, proximal tubule II; DE, distal early; DL, distal late. 

c, d, Kidney sections showing gentamicin-damaged nephrons (asterisks; scale 
bar, 10 jtm). H&E, haematoxylin and eosin; d.p.i., days post-injection; DT, 


Mammalian Six2* cap mesenchyme cells are also characterized by 
their stem-cell-like self-renewal properties'’. Serial transplantation is 
used to distinguish haematopoietic stem cells from progenitors'*. We 
investigated whether we could obtain donor-derived nephrons after 
serial transplantation of WKM from engrafted recipients (Fig. 2h). We 
transplanted WKM from primary fish containing 2-89 cdh17:EGFP* 
donor-derived nephrons and achieved a 48% (n = 21) engraftment 
rate in secondary fish, with the number of donor-derived nephrons 
ranging from 1 to 53 by 41 d.p.t. The WKM from one of these secondary 
fish, containing 53 engrafted nephrons, was transplanted again and 
successfully engrafted a third time, giving rise to 12 donor-derived 
nephrons in the tertiary recipient at 35 d.p.t. (a total of 135 days from 
primary to tertiary fish; Fig. 2i-k). These results demonstrate that 
nephron progenitors possess significant proliferative potential, consist- 
ent with self-renewing capabilities. 

We next sought to identify the cells responsible for nephron pro- 
genitor activity. We noted that approximately 0.1% of the WKM from 
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distal tubule. e-h, Expression of slc20ala in gentamicin-damaged kidneys 
(scale bar, 0.5 mm). i, Immature nephron with dividing cells (arrow) (scale bar, 
10 um). j-m, Uptake of 40 kDa fluorescein isothiocyanate (FITC)-conjugated 
dextran by gentamicin-damaged kidneys (inset in 1 shows an immature 
nephron; arrow in I marks a positive nephron; arrow in m indicates an 
immature nephron; scale bar, 30 jim). n, Serial sections showing that immature 
basophilic nephrons take up 40 kDa dextran-rhodamine. M&B, methylene blue 
and basic fuchsin. 


Tg(cdh17:EGFP) fish is EGFP-positive (Supplementary Fig. 4a). To 
test whether cdh17:EGFP* cells could contribute to new nephrons, 
we sorted and transplanted this fraction (approximately 5,000 cells 
per fish) but failed to observe engraftment (n = 0/7). We subsequently 
explored other markers of nephron progenitors. In mammals, nephro- 
genesis initiates with the formation of ‘pre-tubular aggregates’ that 
undergo a mesenchymal-to-epithelial transition into renal vesicles’’. 
These structures express several transcription factors including 
Lhx1/Lim1 (ref. 14) and Wtl (ref. 15). We therefore examined the 
Tg(lhxla:EGFP)'° and Tg(wtlb:mCherry) transgenic lines to determine 
whether these reporters mark nephron progenitors. Kidneys from 
untreated Tg(/hx1a:EGFP) adults were found to contain three distinc- 
tive EGFP-positive cell populations: (1) single cells with a mesenchymal 
morphology (Fig. 3a) that make up approximately 0.02% of the WKM 
(Supplementary Fig. 4b), (2) homogeneous aggregates of Ihx1a:EGFP* 
mesenchymal cells ranging from a few to approximately 30 cells 
(Fig. 3b, c and Supplementary Fig. 4i) (approximately 100 aggregates 
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Figure 2 | The adult zebrafish kidney contains transplantable progenitors 
that form functional nephrons. a, Overview of the transplantation assay. b, A 
primary transplanted fish at 18 d.p.t. with cdh17:EGFP* donor-derived 
nephrons (arrow; inset, higher magnification view; scale bar, 0.5 mm). 

c, Average number of donor-derived nephrons over time (error bar, one 
standard deviation; n, total fish per time point). d, Head kidney ofa recipient at 
34 d.p.t. showing expansion of renal tissue caused by cdh17:mCherry* donor- 
derived nephrons (arrow; scale bar, 0.5mm). e, A cdh1 7:mCherry* donor- 


per kidney) and (3) renal vesicle-like bodies (0-2 per kidney; Fig. 3d). 
The last two populations were highly reminiscent of pre-tubular aggre- 
gates and renal vesicles in mammals. 

An examination of Tg(/hxla:EGFP;wtlb:mCherry) double trans- 
genic kidneys revealed that only the large aggregates and renal vesicles, 
but not the other [hxla:EGFP* populations, were wt1b:mCherry* 
(Fig. 3d). We hypothesized that the [hx 1a:EGFP* /wt1b:mCherry” renal 
vesicle-like bodies, which were rare in untreated kidneys, constitute 
primitive nephrons. Consistent with this, gentamicin treatment greatly 
induced the formation of lhxla:EGFP*/wtlb:mCherry’ double- 
positive cells (data not shown) and activated the endogenous expression 
of the early-acting renal genes pax2a, fgf8a, wtla and wt1b in similar 
structures (Fig. 3e-j and Supplementary Fig. 4c—h). We failed to detect 
the expression of mature nephron markers in structures resembling 
either Ihxla:EGFP* aggregates or wtlb" renal vesicles (Supplemen- 
tary Fig. 5a-c and Supplementary Table 1). Similarly, quantitative 
PCR analysis of purified Jaxla:EGFP* and cdh17:EGFP* cells showed 
that Ihxla:EGFP* cells express considerably lower levels of mature 
nephron markers than cdh17:EGFP* cells (Supplementary Fig. 5d). 
These findings suggest that Jhxla:EGFP labels nephron progenitors 
and /hxla:EGFP/wt1b:mCherry labels early-stage nephrons. 

To clarify the lineage relationships between lhxla:EGFP* and 
lhx1a:EGFP* /wt1b:mCherry* cells, we took advantage of the optical 
transparency of larval fish to visualize nephrogenesis in vivo. By 
observing Tg(cdh17:EGFP) larvae as well as using wholemount in situ 
hybridization, we found that adult kidney formation initiates at the 
5.2-mm stage (approximately 13 days post-fertilization). The first 
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derived nephron showing functional uptake of 40 kDa FITC-conjugated 
dextran (scale bar, 30 tim). f, Connection of donor-derived nephrons 
(cdh17:mCherry" ) with the cdh17-EGFP* recipient’s renal system (scale bar, 
10 pm). g, A mosaic nephron arising from the co-injection of a mixture of 
cdh17:EGFP- and cdh17:mCherry-labelled nephron progenitors. h, Overview of 
the serial transplantation assay. i-k, Donor-derived nephrons (cdh17:EGFP*, 
arrows) in primary-, secondary- and tertiary-engrafted recipients (scale bar, 
0.5mm). 


nephron appears consistently on the embryonic (pronephric) renal 
tubules just posterior to the swim bladder (Supplementary Fig. 6a-e 
and data not shown). Jhxla:EGFP" cells appeared before this, at the 
4-mm stage (approximately 10 days post-fertilization) (Fig. 4a, arrow, 
inset), rapidly migrated along the pronephric tubules (Fig. 4b, arrows), 
and formed into aggregates (Fig. 4b, arrowheads). An in vivo time course 
of Tg(Ihxla:EGFP;cdh17:mCherry) and Tg(Ihxla:EGFP;wt1b:mCherry) 
larvae showed that the Jhxla:EGFP* aggregates arose from the coales- 
cence of three or four [hxla:EGFP* cells that expanded to form a renal 
vesicle and activated expression of wt1b:mCherry (Fig. 4d and Supplemen- 
tary Fig. 6f). Similar time courses of Tg(/hxla:EGFP;cdh17:mCherry) 
and Tg(wt1b:EGFP;pax8:DsRed) larvae demonstrated that the renal 
vesicle elongated into a cdh17* nephron, with lhxla™ cells becoming 
restricted to the point of fusion with the pronephric tubules, pax8 
initiating in the distal tubule and w#Jb labelling the glomerulus and 
proximal tubule (Fig. 4e and Supplementary Fig. 6g). To demonstrate 
a requirement of [hxla:EGFP* cells for nephrogenesis, we ablated 
single Ihxla:EGFP* aggregates with a laser (Fig. 4c, arrow), resulting 
in aborted nephrogenesis in the targeted region without affecting 
neighbouring nephrons (Fig. 4c, arrowhead) (n = 2/2). 

Next we tested whether /hxla:EGFP* cells had nephron-forming 
activity. Transplantation of single [hxla:EGFP” cells failed to engraft 
conditioned recipients. However, transplantation of individual 
Ihxla:EGFP* aggregates resulted in successful engraftment in 33% 
(n = 15) of transplanted fish (Fig. 4f, g). In one case, a single aggregate 
contributed to 16 nephrons, 27 aggregates and numerous individual 
cells (Supplementary Fig. 7a—c), consistent with Jhxla:EGFP* cells 
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Figure 3 | Expression of Jhxla:EGFP and other renal factors in the adult 
kidney. a, Single mesenchymal cells labelled with [hxla:EGFP (scale bar, 

10 pm). b, Small aggregates labelled with [hx 1a:EGFP (scale bar, 10 um). ¢, An 
untreated kidney showing [hx1a:EGFP* aggregates in a portion of the trunk 
region; scale bar, 0.5 mm). d, Co-expression of wt1b:mCherry and lhxla:EGFP 
in large aggregates (asterisk, autofluorescence; scale bar, 10 um). e, f, Expression 


having extensive proliferative and self-renewing capabilities. Trans- 
plantation of [hxla:EGFP*/wtlb:mCherry* renal vesicles failed to 
engraft (n = 0/10), suggesting that nephron-forming potential is 
restricted to |hxla:EGFP* aggregates. These findings demonstrate that 
Ihxla:EGFP* aggregates contain nephron progenitors and support 
our observation that multiple nephron progenitors are needed to form 
a nephron. 

To determine how similar Jhxla:EGFP* cells are to Six2 mouse 
cap mesenchyme cells, we conducted a microarray analysis, comparin, 
the genes upregulated in hxla:EGFP* cells (relative to cdh17:EGFP 
epithelial cells) with those upregulated in mouse Six2” cells (relative to 
mouse proximal tubule epithelial cells). At a global level, the respective 
gene sets that are upregulated in hx1a:EGFP’ cells and Six2* cells are 
not significantly similar (Supplementary Tables 2-4 and Supplemen- 
tary Fig. 8a). However, there is conservation of several factors impli- 
cated in renal development and/or stem-cell self-renewal. Notably, 
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Untreated 


lhx1a:EGFP 


f 5 d.pi. 


of wt1b in untreated and gentamicin-damaged kidneys (scale bar, 0.5 mm). 
g-i, Expression of wf1b in a large aggregate (g), a comma-shaped body (h) and 
an immature nephron (i) in gentamicin-damaged kidneys (scale bar, 10 um). 
j, Expression of pax2a, wtla and fgf8a in large aggregates or renal vesicles in 
gentamicin-damaged kidneys (scale bar, 10 jum). 


orthologues of Six2 (six2a) and Wt1 (wtla), which are essential for 
cap mesenchyme maintenance, are upregulated both in Jhxla:EGFP* 
cells and in Six2* cells. Quantitative PCR confirmed that six2a and 
wtla are expressed over 15-fold higher in [hxla:EGFP™ cells than 
cdh17:EGFP* cells (Supplementary Fig. 8b). Several other potentially 
important regulators were also identified in the comparison, including 
Meis2, Ezh2 and Tcf3, which are implicated in Wnt signalling and/or 
stem-cell function (Supplementary Table 4). These results suggest that, 
despite having distinct molecular identities, zebrafish Ihxla:EGFP* 
cells and Six2* cells share a core set of regulatory genes that may be 
important for conferring renal stem/progenitor cell potential. 

In conclusion, we have identified an adult population of nephron 
progenitors that reside in small aggregates throughout the zebrafish 
kidney. These cells are uniquely defined by their ability to form new 
functional nephrons during zebrafish growth, injury and after trans- 
plantation. Nephron progenitors can be serially transplanted, consistent 
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Figure 4 | Ihxla:EGFP* cells form nephrons during adult kidney 
development and after transplantation. a, Lateral view of a 
Tg(Ihx1a:EGFP;cdh17:mCherry) larva showing the first lhxla:EGFP* cell to 
appear on top of the cdh17:mCherry* embryonic kidney tubules (arrow and 
inset). b, Lateral view of a Tg(Jhxla:EGFP) larva showing the extent of 
Ihx1a:EGEP* cell migration (arrows) and their aggregation (arrowheads). 

c, Laser-ablation of an Jhxla:EGFP* aggregate (arrow) inhibits nephron 
formation without affecting nephrogenesis of an adjacent aggregate 
(arrowhead). d, Time course of a Tg(/hxla:EGFP;cdh17:mCherry) larva 


with stem-cell capabilities, although confirmation of this awaits direct 
lineage-tracing experiments. Our in vivo imaging of nephrogenesis and 
chimaeric transplantation results demonstrated that nephrogenic 
aggregates form by the coalescence of multiple Jhxla:EGFP™ cells 
(Supplementary Fig. 1). This process is reminiscent of nephrogenesis 
in mammals and suggests that similar mechanisms govern nephron 
formation in both species. Consistent with this, Ihxla:EGFP™ cells 
express six2a and wt1a, two critical regulators of mammalian nephron 
progenitors. Our observation that only aggregates of Ihxla:EGFP* 
cells, but not single cells, are capable of engraftment suggests that 
nephron progenitor potential may depend upon a ‘community 
effect’, a phenomenon whereby continued cell contact is necessary 
for cells to respond to an inductive signal. The failure of renal vesicles to 
engraft suggests that nephron-forming potential is lost upon epithelial 
differentiation. 

With our data in hand, it is now possible to pursue whether the 
mammalian adult kidney contains an equivalent population of nephro- 
genic aggregates. If present, these cells are most probably dormant or 
their regenerative abilities blocked, given that nephrogenesis ceases 
around birth. Using zebrafish to understand the molecular identity of 
nephron progenitors and the pathways that regulate them may lead to 
therapeutic ways to activate, or artificially engineer, the mammalian 
counterpart and augment human renal regeneration. With the rise in 
chronic kidney disease becoming a serious worldwide healthcare issue, 
a nephron-progenitor-based regenerative therapy will have a major 
clinical impact. 


METHODS SUMMARY 


For WKM transplants, adult recipient fish were conditioned with intraperitoneal 
injection of gentamicin (80 1gg '), then immunocompromised with sub-lethal 
y-irradiation (25 Gy) to prevent graft rejection’. Unpurified WKM cells were pre- 
pared as previously described? from Tg(cdh17:EGFP)'° or Tg(cdh17:mCherry) 
donors that express fluorescent reporter proteins in the distal nephron. For the 
Ihxla:EGFP* single cell and aggregate transplants, dissected kidneys from 
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Day 0 (pre-ablation) 


Day 0 (post-ablation) 
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Pre-transplant Post-transplant 


demonstrating that [hxla:EGFP™ cells coalesce into an aggregate and 
differentiate into a nephron. e, Time course of a Tg(wt1b:EGFP;pax8:DsRed) 
larva showing development of a wtlb:EGFP™ aggregate into a nephron. 

f, Bright field and fluorescent merge of an aggregate from a 
Tg(Jhxla:EGFP;cdh17:mCherry) donor. g, Donor-derived cdh1 7:mCherry* 
nephrons (one indicated by arrow) and multiple single Jhxla:EGFP™ cells 
(arrowhead) resulting from the transplantation of the aggregate shown in 

f (scale bar, 30 um). PT, pronephric tubule; sb, swim bladder; larvae shown with 
anterior to the left. 


Tg(/hxla:EGFP;cdh17:mCherry) fish were treated with 10% collagenase/dispase 
for 15 min and cells/aggregates manually transferred with a mouth pipette to a 
drop of 1X PBS/2% fetal calf serum on a glass slide. A single cell or aggregate was 
serially passaged through three droplets of PBS/fetal calf serum until free of non- 
positive cells just before transplantation. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Zebrafish transgenic lines. Zebrafish were maintained as previously described’” 
and according to Institutional Animal Care and Use Committee protocols. The 
transgenic lines Tg(wtlb:EGFP), Tg(Ihxla:EGFP) and Tg(hsp70:dn-fgfrl-EGFP) 
were previously reported'®'*’°. The Tg(wtlb:mCherry) line was generated by 
replacement of EGFP with mCherry in the F47 vector, which contains a shortened 
version of the wt1b promoter that was previously described”®. The Tg(cdh17:EGFP) 
line was generated by isolation of a 4.3-kb genomic fragment upstream of the Exon 1 
5‘ untranslated region. The promoter fragment was cloned into the Xhol/Sall sites of 
the Tol2 vector T2AL200R150G”. The Tg(cdh17:mCherry) line was generated by co- 
injection of Cre mRNA with the Tol2 vector T2cdh17-loxP-EGFP-loxP-mCherry. 
The Tg(pax8:DsRed) line was generated by gene trap screening. DsRedExpress was 
inserted into the BamHI/Not!I sites of the pT2KSAG Tol2 vector” and was used to 
generate the transgenic line. Mapping of the insertion site by inverse PCR revealed 
that DsRedExpress was inserted in the intron region between exons 1 and 2 (T.I. and 
F.O., unpublished observations). 

Adult and larval zebrafish experiments. Epifluorescent images were taken from 
a Nikon Eclipse 80i microscope using the Hamamatsu ORCA-ER camera and 
confocal images were acquired using the Al high-speed confocal Ti-e inverted 
microscope system (Nikon). 

Adult. Gentamicin (401g), BrdU (100g) and 40kDa dextran-FITC or 
-rhodamine (2,1g) were administered by intraperitoneal injection. Single 
nephrons were dissected in Ringer’s buffer using sharpened tungsten needles. 
Kidney wholemount in situ hybridization was as previously described (http:// 
zfin.org/ZFIN/Methods/ThisseProtocol.html) with the addition of 1% dimethyl 
sulphoxide supplemented to the fixative. Fluorescence-activated cell sorting and 
analyses were performed using the BD FACSAria (Harvard Stem Cell Institute 
Flow Cytometry Core Facility). For transplantation of cells directly into the kidney, 
conditioned recipients were anaesthetized with 0.02% tricaine anda lateral incision 
was made posterior to the gills and level with the kidney. One microlitre containing 
approximately 5 X 10° WKM cells, or a single lhxla:EGFP™ cell or a single 
lhxla:EGFP* aggregate was injected directly into the head kidney of recipient fish 
(3 days after conditioning) using a Hamilton syringe. The incision was closed with 
a suture and the fish was returned to water (Supplementary Fig. 3a-k). 

Larva. Larval wholemount in situ hybridization was performed as reported’. 

Ablation of Jhxla:EGFP* aggregates was performed with the MicroPoint Laser 
System (Photonic Instruments) in conjunction with the Nikon Eclipse 80i micro- 
scope. For the cellular necrosis assay, water control and gentamicin-treated kid- 
neys were incubated for 10 min in 5 ug ml ' of acridine orange (Sigma) in PBS, 
washed three times with 50 ml PBS and imaged under bright field and epifluor- 
escence (FITC). 
Histology. Haematoxylin and eosin. Kidneys and larvae were fixed in 4% para- 
formaldehyde, embedded in paraffin, sectioned and stained with haematoxylin 
and eosin or antibodies against EGFP and BrdU (Dana-Farber/Harvard Cancer 
Center Pathology Core Facility). 

Methylene blue and basic fuchsin. Kidneys were fixed in 4% paraformaldehyde, 
embedded in JB4 resin, sectioned and stained with methylene blue and basic fuchsin. 
Microarray analysis. Triplicate samples of approximately 4,000 Ihx1a:EGFP* and 
cdh17:EGFP" cells were sorted by fluorescence-activated cell sorting into lysis 
buffer, complementary DNAs were amplified, labelled with Cy3 (cdh17:EGFP* ) 
and Cy5 (Ihxla:EGFP" ), and hybridized to the Agilent Whole Zebrafish Genome 
Oligo Microarray (3 X 44k) by Miltenyi Biotec. All statistical analysis used the R 
package 2.9.2 (http://cran.r-project.org/). Agilent microarrays were processed using 
the Agilent Feature Extraction software to obtain intensity ratios for each of 
the 43,803 probes on the array. Intensity ratios from the three separate Agilent 
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microarrays were subsequently quantile normalized using the normalize Between 
Arrays function in the affy package and differentially expressed gene between cases 
and controls determined using the limma package. Given the multiple number of 
hypotheses tested we selected the qvalue package for false discovery rate (FDR) 
estimation. RMA normalization, limma and qvalue were used to identify differ- 
entially expressed probe sets across the eight GUDMAP mouse kidney microarrays 
of interest (three for Six2* cells, five for proximal tubule cells), which were down- 
loaded from the GEO database (GSE12588, GSE6589 and GSE6290). To compare 
the differentially expressed genes from the two platforms (and species), the micro- 
array probes were matched to gene identities using annotation files provided by the 
manufacturers. Zebrafish gene identities were then mapped to mouse orthologues 
using a combination of the InParanoid (http://inparanoid.sbc.su.se/) and Ensembl 
databases. A total of 11,870 Ensembl mouse protein identities were identified that 
could be compared from both arrays. Given the smaller number of zebrafish arrays, 
FDR thresholds of 40% (nominal P < 0.016), 45% (P < 0.048) and 50% (P< 0.11) 
were combined with a minimum twofold change in intensity, which corresponded 
to 1,635, 3,335 and 5,879 probes, respectively. More stringent FDR thresholds of 5% 
(nominal P< 0.022), 10% (P<0.07) and 15% (P<0.14) were chosen for the 
mouse probes, leading to 5,250, 5,699, 5,936 probe sets, respectively. Genes that 
were both significantly upregulated and downregulated in the analysis, as judged by 
separate probes, were excluded, leaving a total of 10,421 genes for comparison. We 
used Fisher’s exact test for statistical significance over a range of FDR values, 
yielding a range from P = 0.051 to P= 0.86. The GEO database record number 
for the zebrafish microarray data is GSE24803. 

Quantitative PCR. lhx1a:EGFP* and cdh17:EGFP* cells (300-9,000) were purified 
by fluorescence-activated cell sorting, lysed and complementary DNA generated 
using the Cells-to-cDNA kit (Ambion). Quantitative PCR was performed in trip- 
licate using SYBR Green chemistry on a Mastercycler RealPlex” PCR machine 
(Eppendorf) using the following primer sets: slc20ala forward 5'-TCTCTG 
GGACACATTGCATC-3’, reverse 5'-AGCAGTTCCAGCCATTTGAC-3’; slc13a1 
forward 5'-TGCTGGGATTCCTGTTCTTC-3’, reverse 5’-AAACCTCCACCAA 
CAAGCAG-3’; slc12al forward 5'-TCAACGCTCTGAAGAAGCTG-3’, reverse 
5'-ACGTTGTGTGGGTTTCTTCC-3’; slc12a3 forward 5'-ACAGATCCGGCTG 
AATGAAG-3’, reverse 5’-AGCCAAGCCATGTAAAGAGG-3’; six2a forward 
5'-AGCTCGGAGGATGAGTTTTC-3’, reverse 5’-ATGGTGCCTTGCAGAAG 
AAG-3’; wtla forward 5'-AGCCAACCAAGGATGTTCAG-3’, reverse 5’-AACC 
TTIGATTCCTGGAGCTG-3’. efla was used as the normalization control: forward 
5'-CTGGAGGCCAGCTCAAACAT-3’, reverse 5’-ATCAAGAAGAGTAGTAC 
CGCTAGCATTAC-3’. Relative quantification of target gene expression was 
evaluated using the comparative Cy method. A mean and standard deviation were 
determined for the AC; value for all genes of interest. The error in fold-change was 
obtained by considering the effect of an increase or decrease of one standard 
deviation in AC; value. 
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Brain changes in response to nerve damage or cochlear trauma can 
generate pathological neural activity that is believed to be responsible 
for many types of chronic pain and tinnitus'*. Several studies have 
reported that the severity of chronic pain and tinnitus is correlated 
with the degree of map reorganization in somatosensory and auditory 
cortex, respectively’*. Direct electrical or transcranial magnetic stimu- 
lation of sensory cortex can temporarily disrupt these phantom sensa- 
tions’. However, there is as yet no direct evidence for a causal role of 
plasticity in the generation of pain or tinnitus. Here we report evid- 
ence that reversing the brain changes responsible can eliminate the 
perceptual impairment in an animal model of noise-induced tinnitus. 
Exposure to intense noise degrades the frequency tuning of auditory 
cortex neurons and increases cortical synchronization. Repeatedly 
pairing tones with brief pulses of vagus nerve stimulation completely 
eliminated the physiological and behavioural correlates of tinnitus in 
noise-exposed rats. These improvements persisted for weeks after the 
end of therapy. This method for restoring neural activity to normal 
may be applicable to a variety of neurological disorders. 

Damage to the peripheral nervous system causes plasticity in mul- 
tiple regions of the central nervous system. Significant changes have 
been reported in map organization, spontaneous activity, neural syn- 
chronization and stimulus selectivity’. The ideal method of testing 
whether map plasticity or some other form of plasticity is directly 
responsible for chronic pain and tinnitus would be to reverse the plas- 
ticity and evaluate the perceptual consequence. 

Recent attempts to use sensory exposure or discrimination training 
to reverse the map changes in individuals with tinnitus or chronic pain 
have provided some temporary relief*’. Although the clinical benefits 
were limited, these studies provide some support for the hypothesis 
that neural plasticity could be used to treat these conditions. It is 
possible that a long-lasting reversal of the pathological plasticity in 
these patients would provide significant relief. 

Studies in animals have shown that repeatedly pairing sensory stimuli 
with electrical stimulation of the cholinergic nucleus basalis generates 
powerful and long-lasting changes in cortical organization®. In principle, 
this method could be used to reverse the effect of pathological plastic 
changes that are associated with tinnitus and chronic pain’ **. However, 
nucleus basalis stimulation is highly invasive and, thus, not practical for 
clinical use. We have developed a less invasive method for generating 
targeted neural plasticity by pairing vagus nerve stimulation (VNS) with 
sensory inputs, and have demonstrated a potential clinical application. 

VNS triggers the release of neuromodulators known to promote plas- 
tic changes. The efficacy of VNS in enhancing plasticity seems to lie in 
the synergistic action of multiple neuromodulators acting in the cerebral 
cortex and other brain regions’. VNS improves learning and memory of 
associated events in rats and humans using identical VNS parameters’®. 

Our study tests the hypothesis that the pairing of VNS with tones could 
be used to drive neural plasticity that would reverse the behavioural 
correlate of tinnitus in noise-exposed rats. The first set of experiments 
confirms that repeatedly pairing a single tone frequency with VNS is 
sufficient to generate specific and long-lasting changes in cortical maps. 


The rationale for our tinnitus therapy is that increasing the number of 
cortical neurons tuned to frequencies other than the tinnitus frequency 
ought to reduce the overrepresented tinnitus frequency. The second set of 
experiments confirms that repeatedly pairing a range of tone frequencies 
with VNS can be used to reverse the behavioural and neural correlates of 
tinnitus in noise-exposed rats. 

In our first set of experiments, we sought to evaluate whether pairing 
VNS with tones can generate precise, long-lasting and large-scale changes 
in the frequency representation in the cortex, as we found for nucleus 
basalis stimulation. We paired VNS with a 9-kHz, 60-dB SPL tone (m = 8 
rats) or a 19-kHz, 50-dB SPL tone (m = 5 rats) for 20 days (SPL, sound 
pressure level), 300 times per day in normal-hearing rats with cuff elec- 
trodes implanted on the left cervical vagus nerve (Methods). The VNS- 
tone pairing procedure was identical to earlier tone pairing procedures 
with nucleus basalis, ventral tegmentum or locus coeruleus stimulation 
that generate long-lasting map plasticity*""”. VNS parameters (30 Hz, 
0.8 mA) were similar to the parameters used in previous rat and human 
VNS studies, except that the duration of stimulation and the widths of 
individual pulses were reduced by 60-fold and fivefold, respectively 
(Methods and Supplementary Fig. 2). The 0.5 s of VNS used in this study 
was sufficient to reduce the amplitude of the cortical electroencephalo- 
gram briefly (Supplementary Fig. 3 and supplemental data). Twenty-four 
hours after the last VNS-tone pairing session, we used standard micro- 
electrode mapping techniques to document frequency map plasticity. 
VNS-tone pairing caused a 70-79% increase in the number of primary 
auditory cortex (A1) sites with a characteristic frequency near the paired 
tone frequency (Fig. 1). This result confirms our hypothesis that VNS- 
tone pairing can be used to direct map plasticity lasting more than 24h. 

Pairing VNS with sensory stimuli is a potentially attractive method 
of modifying neural circuits without significant side effects. VNS is well 
tolerated in the 50,000 patients who currently receive VNS therapy for 
epilepsy or depression”. By pairing tones with brief trains of VNS, we 
have been able to alter cortical frequency maps significantly in rats 
using only 1% of the VNS that is delivered clinically (that is, 30 s every 
5 min, 24h per day) for epilepsy treatment in humans. 

Having demonstrated that VNS can be used to generate specific and 
long-lasting map plasticity, in our second set of studies we sought to 
evaluate whether VNS-directed plasticity could be adapted to renormalize 
pathological plasticity and eliminate tinnitus. Exposure to intense, high- 
frequency noise is known to generate an overrepresentation of mid- 
frequency tones, degrade frequency selectivity and increase excitability 
and synchronization of auditory neurons'*"'°. We induced noise 
trauma by exposing rats to 1h of 115-dBSPL, octave-band noise 
centred at 16 kHz (ref. 17; Methods). Auditory brainstem responses 
were used to confirm the effects of the noise exposure on hearing 
threshold, including temporary deafness for frequencies above 8 kHz 
and a long-lasting increase of auditory brainstem response thresholds 
and latency’* (Supplementary Figs 4 and 5). After noise exposure, twice 
as many A1 recording sites were tuned to frequencies between 2 and 
4kHz in comparison with naive controls (35 + 7% versus 14 + 2%, 
P<0.05), and very few neurons responded to frequencies above 
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Figure 1 | VNS-tone pairing causes map plasticity. Repeatedly pairing VNS 
with a tone increases the number of A1 recordings sites tuned to the paired 
frequency. a, VNS was paired with a 9-kHz tone 6,000 times over 20 days in 
eight rats. b, VNS was paired with a 19-kHz tone in five rats. This group heard 
4-kHz tones equally often but without VNS pairing. Asterisks indicate 
significant (P < 0.05) increases in the fraction of A1 sites with characteristic 
frequencies near the paired tone. Error bars, s.e.m. This result in normal- 
hearing rats suggested that VNS-tone pairing might be used to reverse the map 
distortions induced by exposure to intense noise. 


23 kHz (1.7 + 1% versus 11.5 + 3%, P< 0.01). The average frequency 
bandwidth of Al neurons increased by 21% (1.75 +0.04 versus 
1.47 + 0.03 octaves at 10 dB above threshold, P< 0.00001), and the 
average number of spikes evoked by a tone within each site’s receptive 
field increased by 30% (4.3 + 0.1 versus 3.3 + 0.1, P< 0.00001). The 
average spontaneous rate increased by 23% (17.7+0.6 versus 
14.3 + 0.4 Hz, P<0.00001). The degree of synchronization during 
silence measured using the correlation coefficient between multiunit 
activity recorded at nearby sites was significantly increased (1.7 + 0.01 
versus 0.19 + 0.01 synchronous spikes per second of silence, P < 0.05; 
Methods). These changes in frequency tuning and synchronization are 
similar to the physiological changes observed after noise exposure that 
have been proposed to be directly responsible for tinnitus””’. Earlier 
studies using several different methods have documented that noise 
exposure can generate behavioural correlates of tinnitus near the low- 
frequency edge of the noise trauma’’”?**, However, few studies have 
directly compared neurophysiology and behavioural observations 
from the same animals””’. It was therefore of great interest to us to 
relate noise-induced plasticity to perceptual disturbances. 

Each of the eighteen noise-exposed rats used in this study was 
significantly impaired in its ability to detect a gap in narrowband noise 
centred on 8 or 10kHz, but showed no impairment when the gap 
occurred in narrowband noise centred on 2 or 4 kHz or in broadband 
noise (Fig. 2, 4 weeks after exposure). Several studies have concluded 
that a frequency-specific impairment in gap detection is a likely sign 
that noise-exposed rats experience a mid-frequency tinnitus percept 
which fills the silent gaps'””* (Methods and Supplementary Figs 6-9). 
Although it is not possible to evaluate the subjective experience of rats 
definitively, the gap impairment has been taken as a possible beha- 
vioural correlate of tinnitus. 

Map distortion and tuning curve broadening (but not changes in 
spontaneous activity or synchronization) were significantly correlated 
with the degree of gap impairment in untreated noise-exposed rats 
(R> 0.7 (Pearson correlation coefficient), P< 0.05, n = 8 sham rats; 
Figs 3a, b and 4a-d and Supplementary Fig. 13). These correlations 
must be interpreted with caution because any variability in the initial 
cochlear trauma could generate a correlation between neural and 
behavioural changes even in the absence of a causal relationship. 
Though still not definitive, the best test for a causal relationship would 
be to reverse specifically the plasticity generated by noise exposure and 
document the reversal of the gap detection impairment. 
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Figure 2 | VNS/multiple tone pairing eliminates the behavioural correlate 
of tinnitus. Four weeks after noise exposure, each of the rats in both groups 
was unable to detect a gap in one or more of the narrowband noises tested 
(P > 0.05; Supplementary Fig. 8b). The frequency with the greatest impairment 
four weeks after noise exposure is the putative tinnitus frequency for each rat. 
For both groups, gap detection at the putative tinnitus frequency was 
significantly impaired in comparison to broadband noise (P < 0.05). The gap 
detection at the non-tinnitus frequency is based on gap detection in 16-kHz 
narrowband noise. a, Gap detection at the putative tinnitus frequency (dotted 
line) improved significantly after ten days of VNS-tone pairing, and the 
improvement persisted at least until the acute physiology experiment (n = 5 
rats). b, The sham group (n = 9 rats) continued to be impaired. Two sham rats 
did not contribute data at the non-tinnitus frequency because they showed gap 
impairments at 16 kHz (as well as 8 and 10 kHz) four weeks after noise 
exposure. Black and grey horizontal bars represent duration of VNS and sham 
therapy, respectively. Asterisks represent significant differences (P < 0.05) in 
gap detection at the putative tinnitus frequency between VNS therapy and 
sham therapy rats. Error bars, s.e.m. 


We speculated that pairing VNS with randomly interleaved pure 
tones that span the rat hearing range, but exclude the overrepresented 
frequencies, could decrease the cortical representation of the excluded 
frequencies. We also expected that pairing multiple tone frequencies 
with VNS (‘VNS/multiple tone’ pairing) would increase frequency 
selectivity and decrease synchronization as in our earlier nucleus basalis 
stimulation experiments”. We quantified behavioural and physio- 
logical correlates of tinnitus in noise-exposed rats and then tested 
whether pairing VNS with multiple tone frequencies could reverse 
the pathological plasticity and eliminate the perceptual disturbance in 
these rats. 

VNS was repeatedly paired with multiple pure tones 300 times per 
day for 18 days in seven noise-exposed rats with impaired gap detec- 
tion for mid-frequency sounds (Methods). Because we found that gap 
impairment occurred at 8-10 kHz, we selected the frequency of each 
randomly interleaved tone to be 1.3, 2.2, 3.7, 17.8 or 29.9 kHz. This 
pairing procedure was chosen because previous studies suggest it 
would reduce the cortical response to mid-frequency tones, increase 
frequency selectivity and decrease cortical synchronization’. After 
ten days of therapy, each of the seven rats showed a significant startle 
reduction in cued trials relative to uncued trials for every frequency 
tested (P < 0.05; Fig. 2a and Supplementary Fig. 9a). Thus, pairing of 
VNS with multiple tones reversed the behavioural effect of noise expo- 
sure, which suggests that the rats’ presumed tinnitus was no longer 
present. In contrast, rats in the sham therapy group showed a consistent 


©2011 Macmillan Publishers Limited. All rights reserved 


a Naive rats 


1 2 4 8 16 32 


b Noise-exposed rats after sham therapy 


Intensity (dB SPL) 


1 2 4 8 16 32 


Bulpuodsal Ly jo ebeyusII0q4 


c Noise-exposed rats after VNS-tone therapy 


7 60 


1 2 4 8 16 32 
Characteristic frequency (kHz) 


Figure 3 | VNS/multiple tone pairing reverses map distortion. The increased 
response of Al neurons to tones following noise exposure is reversed by VNS/ 
multiple tone pairing. a, Colour indicates the percentage of Al neurons in naive 
rats that respond to a tone of any frequency and intensity combination. 

b, Percentage of Al neurons that respond to each tone in noise-exposed rats that 
received sham therapy. c, Percentage of Al neurons that respond to each tone in 
noise-exposed rats that received the VNS/multiple tone therapy. Black contour 
lines indicate 20, 40, and 60% responses. The white lines in b surround the regions 
of tones that are significantly increased (P < 0.01) in comparison with naive rats. 
The white lines in c indicate significant decreases (P < 0.01) in comparison with 
noise-exposed sham therapy rats. The filled white circles indicate the tone for 
which the increase in the number of cortical neurons was greatest, which is used 
to quantify the degree of map distortion in Fig. 4a, b. The filled black circles 
indicate the tone for which the proportional increase was greatest. 


impairment in their ability to detect gaps in the putative tinnitus fre- 
quency (Fig. 2b). Each of the nine rats that received sham therapy (tones 
with no VNS, VNS with no tones or no therapy; n = 4, 2, 3 rats, respec- 
tively) did not show a significant startle reduction in cued trials 
(P > 0.05; Supplementary Fig. 9b) for at least one of the frequencies 
tested at each time point. 

In the rats that received VNS paired with multiple tones, the impair- 
ment in gap detection was also eliminated when measured one day, 
one week and three weeks after the end of the therapy. This impair- 
ment was maintained in all three control groups at every time point 
tested (Supplementary Figs 9 and 10). These results indicate that pair- 
ing VNS with multiple tone frequencies is sufficient to eliminate the 
gap impairment induced by noise exposure (Supplementary Fig. 11). 
This is the first method reported to generate a long-lasting reversal of a 
behavioural correlate of chronic tinnitus. 

Three weeks after the end of VNS/multiple tone pairing or sham ther- 
apy, we evaluated the physiological properties of the auditory cortex of 
each rat to determine whether the restored behaviour in the treated group 
was due to renormalization of the auditory cortex. After VNS/multiple 
tone pairing, most of the Al properties that were degraded by noise 
exposure returned to normal levels. For example, the proportion of Al 
neurons with characteristic frequencies between 12 and 23 kHz was indis- 
tinguishable from that in naive controls after VNS/multiple tone treat- 
ment (naive, 20 + 2%; sham, 15 + 5%; therapy, 30 + 9%; Supplementary 
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Figure 4 | Neurophysiological properties of naive, sham and therapy rats. 
a, ¢, €, g, i, Noise exposure caused a significant map distortion (a), decreased 
frequency selectivity (c), increased the tone-evoked response (e), increased the 
spontaneous rate (g) and increased the degree of cortical synchronization (i). VNS/ 
multiple tone pairing returned each of these parameters, except spontaneous 
activity, to normal levels. b, d, f, Map organization (b), frequency selectivity (d) and 
tone-evoked response strength (f) were all correlated with the degree of gap 
impairment in individual rats. h, j, Spontaneous activity (h) and synchronization 
(j) were not significantly correlated with gap impairment. Each rat’s gap detection 
ability was quantified as the average gap detection at the putative tinnitus 
frequency of each rat, averaged across the four time points collected after the 
beginning of therapy (Fig. 2). Error bars, s.e.m. Asterisks represent significant 
differences compared with naive rats (P values as indicated). Triangles and circles 
represent rats from the sham and therapy groups, respectively. 


Figs 12 and 13a). The proportion of Al neurons responding to 4-kHz, 
70-dB SPL tones significantly increased relative to naive controls in sham 
rats and returned to normal levels in rats that had received the therapy 
three weeks earlier (naive, 45.4 + 5.0%; sham, 74.1 + 7.6%; therapy, 
49.1 + 6.6%; Figs 3 (white circles) and 4a). The degree of low-frequency 
map distortion was positively correlated with the degree of gap impair- 
ment observed in individual rats (Fig. 4b and Supplementary Fig. 13b). 
The percentage of cortex responding to 8-kHz, 30-dB SPL tones (Fig. 3, 
black circles) was also well correlated with the gap detection impairment 
(R* = 0.51, P = 0.006). These results support the earlier hypothesis that 
changes in cortical maps are causally related to tinnitus*”*. 
VNS/multiple tone pairing reversed the increase in the width of fre- 
quency tuning of Al multiunit activity (that is, decreased frequency 
selectivity) observed in noise-exposed rats (Fig. 4c). The bandwidth 
(measured at 10, 20, 30 or 40 dB above threshold) averaged across all 
A1 sites was highly correlated with the degree of gap impairment (Fig. 4d 
and Supplementary Fig. 14), thus supporting the earlier hypothesis that 
decreased frequency selectivity is causally related to tinnitus”. 
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VNS/multiple tone pairing reversed the increase in cortical excitability 
observed in noise-exposed rats (Fig. 4e). The average number of spikes 
evoked by tones within each site’s receptive field was weakly correlated 
with the degree of impairment of gap detection (Fig. 4f), supporting the 
earlier hypothesis that tinnitus is related to increased excitability of cortical 
neurons**””. 

Finally, VNS/multiple tone pairing also reversed the increase in cor- 
tical synchronization observed in noise-exposed rats, but did not reverse 
the increase in cortical spontaneous activity observed in noise-exposed 
rats (Fig. 4g, i). There was a trend for the degree of synchronization to be 
correlated with the degree of gap impairment, but no correlation between 
the rate of spontaneous activity and the degree of gap impairment 
(Fig. 4h, j). Our observation that noise-induced increases in spontaneous 
activity and synchronization are not significantly correlated with beha- 
vioural correlates of tinnitus in individual rats is consistent with earlier 
reports'””’. However, given the potential for small changes in anaesthesia 
level to influence spontaneous activity and synchronization in the cortex, 
it remains a possibility that these factors contribute to tinnitus. 

Hearing loss, hyperacusis and tinnitus often result from noise expo- 
sure and could contribute to the gap impairments observed in this study. 
Our results confirm that exposure to intense, high-frequency noise 
causes pathological plasticity that is well correlated with the inability 
to detect a gap in a mid-frequency, 65-dB SPL tone. Correlations alone 
do not suggest that these changes cause tinnitus because another con- 
founding factor (such as variability in the degree of cochlear trauma) 
could cause both variables to be correlated without a causal connection. 
By randomizing the treatment of rats with identical noise exposure, we 
were able to eliminate the potential confound caused by variability in the 
response to noise exposure. Thus, our observation that pairing multiple 
tone frequencies with VNS can reverse both the neural and behavioural 
correlates of tinnitus provides good evidence that abnormal activity in 
the central auditory system is responsible for the subjective experience of 
tinnitus. In addition, neural correlates of hearing loss (tone thresholds) 
and hyperacusis (rate level functions) were not correlated with gap 
impairment in the rats tested (Supplementary Information). Thus, it is 
reasonable to conclude that the gap impairments observed in this study 
are primarily related to tinnitus. 

VNS-directed plasticity represents a potentially powerful approach 
to treating tinnitus. Unlike pharmaceutical approaches, this method 
provides the possibility of generating long-lasting and stimulus-spe- 
cific changes to neural circuits with minimal side effects. Our control 
experiments demonstrate that VNS-directed plasticity is driven by the 
repeated association of VNS with tones, and not by VNS alone. 
Additional studies are needed to determine whether the pairing of 
other sensory events with brief periods of VNS could be used to reverse 
the pathological plasticity associated with other common neurological 
conditions, such as chronic pain and amblyopia. 


METHODS SUMMARY 


The VNS-tone pairing protocols, noise exposure procedure, gap detection testing, 
neurophysiology techniques and analysis are described in Methods. The noise 
exposure procedure, gap detection testing, and neurophysiology techniques were 
identical to those in earlier reports*!””°. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


VNS surgical protocol. Female Sprague-Dawley rats (250-350 g) were implanted 
with a platinum-iridium bipolar cuff electrode around the left cervical vagus nerve’. 
As in humans, only the left vagus nerve was stimulated because the right vagus 
nerve contains efferents that stimulate the sinoatrial node and can cause cardiac 
complications'*. Leads from the electrode were tunnelled subcutaneously to the top 
of the head. A four-channel connector was used to deliver current to the stimulating 
electrode and monitor the electroencephalogram (EEG) during daily VNS sessions. 
Bone screws placed over the vertex and the cerebellum were used to record auditory 
brainstem responses (ABRs) and EEG. Each rat was given antibiotics to prevent 
infection and a single dose of atropine and dexamethazone to reduce fluid accu- 
mulations in the lungs immediately after completion of the surgery. 

VNS stimulation parameters and single-tone pairing procedures. VNS was 
delivered to unanaesthetized, unrestrained rats in a 25 X 25 X 25cm? wire cage, 
located inside a 50 X 60 X 70 cm?* chamber lined with acoustic insulating foam. A 
pilot study was conducted to determine the minimal VNS parameters that reliably 
reduced EEG amplitude during slow-wave sleep (Supplementary Fig. 3). VNS para- 
meters were identical for every rat in this study. Each 100-t1s, charge-balanced biphasic 
pulse was delivered with a current of 0.8 mA. The stimulation was delivered as a train 
of 15 pulses at 30 Hz (500-ms train duration). Cuff impedances were measured daily 
(~5kQ). The impedance for three rats was unusually high after implantation and 
these rats were assigned to the tone-alone and no-therapy groups. The impedance was 
stable across the duration of training for all other rats. The 500-ms pure tones began 
150 ms after the onset of the VNS train (Supplementary Figs 1 and 2). For our earlier 
nucleus basalis stimulation studies, stimulation beginning either 200 ms before tone 
onset or 50 ms after tone onset generated indistinguishable map plasticity’. 

VNS was delivered 300 times per day for 20 days, during a VNS-tone pairing 
session that lasted 2.5 h (Supplementary Figs 2). To prevent rats from anticipating 
stimulation timing, there was a 50% chance that VNS would be delivered every 15s. 
Twenty-four hours after the last pairing, rats were anaesthetized with pentobarbital 
and the right auditory cortex was exposed to allow for high-density extracellular 
microelectrode mapping*. 

One group of rats (1 = 8) was exposed to a single 9-kHz, 60-dB SPL tone paired 
with VNS. No sound was presented when VNS was not delivered. A second group 
(n = 5) was exposed to a 19-kHz, 50-dB SPL tone paired with VNS. During the 
trials in which no VNS was delivered (50%), a 4-kHz, 50-dB SPL tone was pre- 
sented. As a result, a 19- or 4-kHz tone was delivered every 15s. Frequency and 
intensity calibrations were performed with an ACO Pacific microphone (PS9200- 
7016) and Tucker-Davis Technologies SIGCAL v4.2 software. The free-field tones 
were presented from a speaker (Optimus) suspended 20 cm above the wire cage. 
All paired tones had a 5-ms rise-fall time. The intensity of every tone was selected 
to be approximately 20 dB SPL above the rat hearing threshold. 

Noise exposure and ABRs. Twenty-eight experimental and control rats were 
barbiturate-anaesthetized and exposed to 16-kHz, 115-dB SPL, octave-band noise 
for 1h (refs 17, 20). A single speaker was positioned 5 cm from the left ear. No ear 
plugs were used to restrict the noise exposure to one ear. Bilateral noise exposure 
was used because it best approximates the noise exposure that occurs in humans. To 
confirm cochlear trauma, elevated thresholds were quantified using ABRs in ten 
rats under pentobarbital anaesthesia before noise exposure, immediately after expo- 
sure and 11 weeks after noise exposure (when the auditory cortex was mapped). For 
ABR recordings, the speaker was positioned 10 cm from the left ear and pure tones 
(10 ms long, 2.5-ms rise-fall time) were delivered at a rate of 20 Hz. Tone frequen- 
cies were 4, 10, 16 and 32 kHz in 10-dB steps from 0 to 85dBSPL. Tones were 
randomly interleaved with 1,500 repeats for each frequency-intensity combination. 
The signals were filtered from 100 to 3,000 Hz and recorded using BRAINWARE 
v8.12 (Tucker-Davis Technologies). Threshold was defined as the lowest 10-dB SPL 
step at which an ABR could be recognized (Supplementary Fig. 4). 

Gap detection testing. The Turner gap detection method was used to assess a 
behavioural correlate of tinnitus in every noise-exposed rat'’ (Supplementary Figs 
6-8). This method has previously been cross-validated with a conditioned lever 
suppression task”® (R = 0.75) and a licking suppression task’. The gap detection 
method was selected because it avoids the need for food or water deprivation, 
electric shock or months of behavioural training’’. Testing took place in a 20 X 20 
20 cm? wire-mesh cage ina 67 X 67 X 67 cm’ chamber lined with 5-cm acoustic 
foam. The cage was placed on a startle platform (Lafayette Instrument Co.) that 
used a piezoelectric transducer to generate a continuous record of downward force. 
Sounds were generated using System 3 hardware and software (Tucker-Davis 
Technologies) and were delivered by a speaker (Tucker-Davis Technologies 
FF1) mounted 20cm above the cage. Rats underwent gap detection testing with 
different band-pass-filtered (1,000-Hz bandwidth) sounds centred at 2, 4, 8, 10, 16, 
20 and 24kHz at 65 dB SPL (ref. 17). Startle responses were elicited by a 20-ms 
burst of white noise at 100 dB SPL. In 50% of trials, a 50-ms gap embedded in the 
continuous sound served as a warning of a subsequent startling noise and allowed 
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rats to reduce the amplitude of the response (Supplementary Fig. 7b). The gap in 
the narrowband noise began 100 ms before the onset of the broadband startling 
noise. Rats underwent 30 trials during each session. The order of sessions with 
different continuous sounds was counterbalanced across rats. The interval between 
each startle sound was 30-35 s. 

In untreated noise-exposed rats, gaps in a specific narrowband sound (usually 8 
or 10 kHz) did not serve as an effective warning, presumably because the ongoing 
tinnitus percept prevented the rats from detecting the silent gap. Thus, the animals 
were not warned that a loud startling noise was coming and exhibited a strong 
startle response (Supplementary Figs 7b and 8b). Gap detection was quantified as 
one minus the ratio of the startle amplitude when the startling noise was preceded 
by a gap in the 65-dB SPL, continuous narrowband sound to the startle amplitude 
when the startling noise was not preceded by a warning gap. Supplementary Fig. 8 
shows typical data from one noise-exposed rat for a session in which the noise 
burst was cued with a gap in broadband noise (left) and a session in which a gap in 
an 8-kHz tone served as the warning cue (right). The warning gap typically 
reduced the startle amplitude by 60-70% (Supplementary Fig. 8a). In noise- 
exposed rats, gaps in the narrowband noise centred near the low edge of the 
trauma noise typically reduced the startle amplitude by less than 20%, which is 
not a statistically significant reduction (Supplementary Fig. 8b). The same pro- 
cedure was also administered using gaps in 65-dB SPL broadband noise as warning 
cues of the startling noise (Supplementary Fig. 7a). The frequency with the greatest 
impairment four weeks after noise exposure is the putative tinnitus frequency for 
each rat (Fig. 2). 

Thirty-six rats were initially tested using the gap startle task for inclusion in this 
study. Five rats were excluded from the study because they showed no detectable 
startle response to the noise burst. Of the 31 remaining rats, three were excluded 
because their startle responses were unusually variable. Twenty-eight rats received 
noise exposure. Eighteen of these showed a statistically significant impairment in 
the detection of gaps in one or both mid-frequency (8- or 10-kHz) narrowband 
sounds tested, relative to gap detection before noise exposure (P < 0.05). Three 
rats were excluded from further study because they no longer showed a startle 
response to the noise burst (that is, could no longer detect the startle stimulus). 
Seven rats were excluded from further study because they showed no impairment 
in gap detection (that is, no evidence of tinnitus). Our observation that gap impair- 
ments do not always result from noise exposure is consistent with human and 
animal studies showing that although hearing loss is common in individuals with 
tinnitus, the majority of individuals with hearing loss do not have tinnitus**?*’. 

Each of the eighteen rats included in this study showed a significant impairment 

in its ability to detect a gap in narrowband noise centred on 8 kHz (16 of 18) or 
10kHz (12 of 18). None of the 18 rats showed a significant impairment in the 
ability to detect a gap in low-frequency narrowband noises (2 or 4kHz) or in 
broadband noise (Fig. 2 and Supplementary Fig. 11). This result indicates that 
these rats are able to respond normally to the startling noise burst and that the 
mechanisms for modulating the startle response using silent gaps remain intact. 
Our observation that noise-exposed rats can show gap detection impairments 
centred at a single frequency or across a narrow range of frequencies is consistent 
with clinical studies showing significant heterogeneity across subjects in the spec- 
tral characteristics of the tinnitus percept’***”’. Despite this heterogeneity, a large 
fraction of tinnitus patients can match their tinnitus to a pitch and describe their 
phantom sound as tonal’. 
VNS tone delivery to noise exposed rats. Rats were tested for gap impairment 
four weeks after noise exposure and 10 and 20 days after the beginning of the sham 
or experimental therapy. In the VNS/multiple tone paired group (n = 5 rats), 
tones were paired with VNS every 15s with no VNS-tone pairing 50% of the time. 
The tone frequencies paired with VNS in the therapy group were designed to 
reduce the 8-10-kHz region of the frequency map. VNS was repeatedly paired 
with a 1.3-, 2.2-, 3.7-, 17.8- or 29.9-kHz tone that was randomly selected every trial 
(300 trials per day). Each tone was presented at ~20 dB above the normal hearing 
threshold for that frequency. The tone-alone control group was passively exposed 
to the same tones on the same schedule as used in the paired group. A VNS-alone 
control group received VNS stimulation on the same schedule as used in the paired 
group without presentation of tones. The third control group did not receive tones 
or VNS. 

To test whether the tinnitus percept remained suppressed after the end of VNS- 
tone pairing, rats were also tested on gap detection one and three weeks after the 
end of therapy. At the end of three weeks (that is, 11 weeks after noise exposure), 
multiunit responses were recorded from auditory cortex neurons from the ther- 
apy, sham and naive control rats using dense microelectrode mapping techniques. 
Physiological and behavioural results from the tone-alone, VNS-alone and no- 
therapy groups were statistically indistinguishable (Supplementary Fig. 10 and 
physiological data not shown). Data from the three groups are combined and 
referred to as sham controls in the main text. 
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Neurophysiology. In this study, we recorded from a total of 1,492 sites in 21 rats 
(n = 8 naive controls, n = 5 VNS therapy and n = 8 sham controls). Nine hundred 
and sixty-five of those sites were in Al and were included in the analysis presented 
in this report. We recorded 220 multiunit responses from A1 sites in noise- 
exposed rats that received VNS/multiple tone pairing (n = 5). We also recorded 
321 Al sites from noise-exposed rats that did not receive VNS/tone pairing 
(n = 8). The latter group included noise-exposed rats that received tones with 
no VNS (n = 3), VNS with no tones (1 = 2) or no therapy (n = 3). Because neural 
and behavioural responses were similar in all three control groups, the results were 
pooled to form a single data set referred to as the sham therapy group. During the 
acute electrophysiology recordings, sounds were delivered in a foam-lined, 
double-walled, sound-attenuated chamber using a speaker (Motorola 40-1221) 
positioned directly opposite the left ear at a distance of 10 cm. Multiunit responses 
were recorded using Parylene-coated tungsten electrodes that were glued together 
(250-j1m separation, 2 MQ at 1 kHz; FHC) and lowered approximately 500 ym 
below the cortical surface. Frequency and intensity calibrations were performed 
with an ACO Pacific microphone (PS9200-7016) and Tucker-Davis Technologies 
SIGCAL v4.2 software. Auditory frequency tuning curves were determined at each 
site by presenting 81 logarithmically spaced frequencies spanning 1 to 32 kHzat 16 
intensities from 0 to 75 dB SPL (1,296 total stimuli). The tones (25-ms duration, 
5-ms rise-fall time) were randomly interleaved and separated by 500 ms. Tuning 
curve parameters were determined by an experienced blind observer using custom 
software written in MATLAB v7.9 (Mathworks) to randomize the order of data 
from each recording site across all groups. Experimenters were blind to the experi- 
mental conditions of each rat during electrophysiology recordings. 

Data analysis. Gap discrimination was quantified as the percentage inhibition of 
the startle response when a gap (warning cue) preceded the starling noise relative 
to the startle response when no gap was present'’. Eight of 36 rats tested failed to 
generate consistent startle responses and were excluded from the study before 
noise exposure. Noise exposure eliminated the startle response in three of the 
remaining 28 rats, and these rats were excluded from the study. Noise exposure 
failed to generate any impairment in gap detection in seven of the remaining 25 
rats, and these rats were also excluded from the study. Eighteen noise-exposed rats 
were included in the study. Neural responses were collected from thirteen rats (five 


VNS-tone paired rats and eight sham therapy rats). One rat died before neural 
responses could be collected. Only behavioural responses (and EEG) were col- 
lected from the remaining four rats (two treated rats and two shams) so that the 
duration of the benefit could be estimated. 

Sites were determined to be in Al on the basis of continuous tonotopy. At 
each A1 recording site, characteristic frequency, frequency bandwidth, response 
threshold, spontaneous rate and latency were determined using a standard method 
in which the experimenter was blind to the experimental group and recording 
location®. At each pair of simultaneously recorded A1 sites, neural synchrony 
during silence (300 s) was quantified as the cross-correlation function”. The peak 
in the cross-correlation function (with or without subtraction of the shift 
predictor) was also computed and gave similar results to Pearson correlation 
coefficient (R). Map plasticity was quantified as the percent of Al neurons with 
a characteristic frequency in a given range or as the percent of Al neurons 
responding to each frequency-intensity combination using the Voronoi tessellation 
method of interpolation***. Frequency selectivity was quantified as the bandwidth 
10, 20, 30 or 40 dB above threshold. Results were similar regardless of the intensity 
above threshold used. Excitability was quantified as the number of spikes evoked by 
each tone within each site’s receptive field and as the spontaneous activity rate 
during silence. 

All protocols and recording procedures comply with the NIH Guide for the 
Care and Use of Laboratory Animals and were approved by the Institutional 
Animal Care and Use Committee at the University of Texas at Dallas. 
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Directed differentiation of human pluripotent stem 
cells into intestinal tissue in vitro 
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Studies in embryonic development have guided successful efforts 
to direct the differentiation of human embryonic and induced 
pluripotent stem cells (PSCs) into specific organ cell types in 
vitro’’. For example, human PSCs have been differentiated into 
monolayer cultures of liver hepatocytes and pancreatic endocrine 
cells** that have therapeutic efficacy in animal models of liver 
disease”* and diabetes’, respectively. However, the generation of 
complex three-dimensional organ tissues in vitro remains a major 
challenge for translational studies. Here we establish a robust and 
efficient process to direct the differentiation of human PSCs into 
intestinal tissue in vitro using a temporal series of growth factor 
manipulations to mimic embryonic intestinal development’. This 
involved activin-induced definitive endoderm formation'!, FGF/ 
Wnt-induced posterior endoderm pattering, hindgut specification 
and morphogenesis’*"“, and a pro-intestinal culture system'®’® to 
promote intestinal growth, morphogenesis and cytodifferentia- 
tion. The resulting three-dimensional intestinal ‘organoids’ con- 
sisted of a polarized, columnar epithelium that was patterned into 
villus-like structures and crypt-like proliferative zones that 
expressed intestinal stem cell markers’’. The epithelium contained 
functional enterocytes, as well as goblet, Paneth and enteroendo- 
crine cells. Using this culture system as a model to study human 
intestinal development, we identified that the combined activity of 
WNT3A and FGF4 is required for hindgut specification whereas 
FGF4 alone is sufficient to promote hindgut morphogenesis. Our 
data indicate that human intestinal stem cells form de novo during 
development. We also determined that NEUROG3, a pro-endocrine 
transcription factor that is mutated in enteric anendocrinosis”, is 
both necessary and sufficient for human enteroendocrine cell 
development in vitro. PSC-derived human intestinal tissue should 
allow for unprecedented studies of human intestinal development 
and disease. 

The epithelium of the intestine is derived from a simple sheet of cells 
called the definitive endoderm”. As a first step to generating intestinal 
tissue from PSCs (summarized in Supplementary Fig. 1), we used 
activin A, a nodal-related TGF-8 molecule, to promote differentiation 
into definitive endoderm as previously described", resulting in up to 
90% of the cells co-expressing the definitive endoderm markers SOX17 
and FOXA2 and fewer than 2% expressing the mesoderm marker 
brachyury (Supplementary Fig. 2a). Using microarray analysis we 
observed a robust activation of definitive endoderm markers, many 
of which were expressed in mouse definitive endoderm from embry- 
onic day (e)7.5 embryos (Supplementary Fig. 3 and Supplementary 
Table 1a, b). We investigated the intrinsic ability of definitive endo- 
derm to form foregut and hindgut lineages by culturing for 7 days 
under permissive conditions and observed that cultures treated with 
activin A for only 3 days were competent to develop into both foregut 
(albumin (ALB) * and PDX1*) and hindgut (CDX2) lineages (Fig. 1b, 
control). In contrast, treatment with activin A for 4—5 days resulted in 


definitive endoderm cultures that were intrinsically anterior in char- 
acter and less competent in forming posterior lineages (Supplementary 
Fig. 2b). 
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Figure 1 | FGF4 and WNT3A act synergistically in a temporal and dose- 
dependent manner to specify stable posterior endoderm fate. a—d, Activin A 
(100 ng ml’) was used to differentiate H9 human ES cells into definitive 
endoderm. Definitive endoderm was treated with the posteriorizing factors 
FGF4 (50 or 500 ng), WNT3A (50 or 500 ng), or both for 6, 48 or 96h. Cells 
were placed in permissive media for 7 days and expression of foregut markers 
(ALB, PDX1) and the hindgut marker (CDX2) were analysed by RT-qPCR 
(a) and immunofluorescence (b-d). The definitive endoderm of controls was 
grown for identical lengths of time in the absence of FGF4 or WNT3A. High 
levels of FGF4+ WNT3A for 96h resulted in stable CDX2 expression and lack 
of foregut marker expression. Scale bars, 50 um. Error bars are s.e.m. (n = 3). 
*P < 0.05, **P < 0.001, ***P < 0.0001. 
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Having identified the window of time when definitive endoderm 
fate was plastic (day 3 of activin A treatment), we used WNT3A and 
FGF4 to promote hindgut and intestinal specification. Studies in 
mouse, chick and frog embryos have demonstrated that Wnt and 
FGF signalling pathways are required for repressing anterior develop- 
ment and promoting posterior endoderm formation into the midgut 
and hindgut'*""*. Consistent with this, conditioned media containing 
WNTS3A was recently shown to promote Cdx2 expression in mouse 
embryonic stem (ES)-cell-derived embryoid bodies”’. In human defini- 
tive endoderm cultures, neither factor alone was sufficient to robustly 
promote a posterior fate (Supplementary Fig. 2c); but high concentra- 
tions of both FGF4 and WNT3A (FGF4+WNT3A) induced expres- 
sion of the hindgut marker CDX2 in the definitive endoderm after 48 h 
(Supplementary Fig. 4). However, 48h of FGF4+WNT3A treatment 
did not stably induce a CDX2* hindgut fate and expression of anterior 
markers PDX1 and albumin reappeared after cells were cultured in 
permissive media for 7 days (Fig. la, c). In contrast, 96h of exposure 
to FGF4+ WNTS3A resulted in stable CDX2 expression and absence of 
anterior markers (Fig. la, d). These findings indicate a previously 
unidentified requirement for the synergistic activities of both the 
FGF and Wnt pathways in specifying the CDX2* mid/hindgut lineage. 

Remarkably, FGF4+ WNT3A-treated cultures underwent morpho- 
genesis that was similar to embryonic hindgut formation. Between 2 
and 5 days of FGF4+ WNT3A treatment, flat cell sheets condensed into 
CDX2"* epithelial tubes, many of which budded off to form floating 
hindgut spheroids (Fig. 2a-c, Supplementary Fig. 5a-f and Sup- 
plementary Table 2a). Spheroids were similar to e8.5 mouse hindgut 
and consisted of uniformly CDX2* polarized epithelium surrounded 
by CDX2* mesenchyme (Fig. 2d-g). Spheroids were completely devoid 
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Figure 2 | Morphogenesis of posterior endoderm into three-dimensional, 
hindgut-like spheroids. a, Bright-field images of definitive endoderm cultured 
for 96h in media, FGF4, WNT3A or FGF4+WNT3A. FGF4+WNT3A 
cultures contained three-dimensional epithelial tubes and free-floating spheres 
(black arrows) b, CDX2 immunostaining (green) and nuclear stain (DRAQ5, 
blue) on cultures shown in a. Insets show CDX2 staining alone. c, Bright-field 
image of hindgut-like spheroids. a-c, Scale bars, 50 um. d-f, Analysis of CDX2, 
basal-lateral laminin and E-cadherin expression demonstrates an inner layer of 
polarized, cuboidal, CDX2™ epithelium surrounded by non-polarized 
mesenchymal CDX2°* cells. Scale bar in e is 20 um. g, CDX2 expression in an 
e8.5 mouse embryo (sagittal section). Inset is a magnified view showing that 
both hindgut endoderm (E; outlined with a red dashed line) and adjacent 
mesenchyme (M) are CDX2 positive (green). FG, foregut; HG, hindgut. Scale 
bar, 100 pum. 
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of albumin and PDX1-expressing foregut cells (Supplementary Fig. 5h, 
i). In vitro gut-tube morphogenesis was never observed in control or 
WNT3A-only treated cultures. FGF4-treated cultures had a twofold 
expansion of mesoderm and generated 4-10-fold fewer spheroids 
(Supplementary Fig. 2c and Supplementary Table 2a), which were 
weakly CDX2* and did not undergo further expansion (data not 
shown). Together our data support a mechanism for hindgut develop- 
ment where FGF4 promotes mesoderm expansion and morphogenesis, 
whereas FGF4 and WNT3A synergy is required for the specification of 
the hindgut lineage. 

Importantly, this method for directed differentiation is broadly 
applicable to other PSC lines, as we were able to generate hindgut 
spheroids from both H1 and H9 human ES cell lines and from four 
induced PSC (iPSC) lines that we have generated and characterized 
(Supplementary Figs 3, 5 and 6). The kinetics of differentiation and the 
formation of spheroids were comparable between these lines (Sup- 
plementary Table 2). Two other iPSC lines tested were poor at hindgut 
spheroid formation and line iPSC3.6 also had a divergent transcrip- 
tional profile during definitive endoderm formation (Supplementary 
Fig. 3 and Supplementary Table 2c). 

Whereas in vivo engraftment of PSC-derived cell types, such as pan- 
creatic endocrine cells, has been used to promote maturation”, efficient 
development and maturation of organ tissues in vitro has proven more 
difficult. We investigated whether hindgut spheroids could develop and 
mature into intestinal tissue in vitro using recently described three- 
dimensional culture conditions that support growth and renewal of 
the adult intestinal epithelium’*'®. When placed into this culture 
system, hindgut spheroids developed into intestinal organoids in a 
staged manner that was notably similar to fetal gut development 
(Fig. 3, Supplementary Fig. 5g and Supplementary Fig. 7). In the first 
14 days the simple cuboidal epithelium of the spheroid expanded and 
formed a highly convoluted pseudostratified epithelium surrounded by 
mesenchymal cells (Fig. 3a—-c), similar to an e12.5 fetal mouse gut 
(Fig. 3f). After 28 days, the epithelium matured into a columnar 
epithelium with villus-like involutions that protrude into the lumen 
of the organoid (Fig. 3d, e). Comparable transitions were observed 
during mouse fetal intestinal development (Fig. 3f, g and Supplemen- 
tary Fig. 7). The spheroids expanded up to 40 fold in mass as they 
formed organoids (data not shown) and were split and passaged over 
9 additional times and cultured for over 140 days with no signs of 
growth failure. The cellular gain during that time was up to 1,800 fold 
(data not shown), resulting in a total cellular expansion of 72,000 fold 
per hindgut spheroid. This directed differentiation was up to 50 fold 
more efficient than spontaneous embryoid body differentiation meth- 
ods”’ (Supplementary Fig. 8) and resulted in organoids that were almost 
entirely intestinal (Supplementary Fig. 2e-g) as compared to embryoid 
bodies that contained a mix of neural, vascular and epidermal tissues 
(Supplementary Fig. 8). 

Marker analysis showed that after 14 days in culture, virtually all of 
the epithelium expressed the intestinal transcription factors CDX2, 
KLF5 and SOX9 broadly and was highly proliferative (Fig. 3b, c). By 
28 days, CDX2 and KLF5 remained broadly expressed in over 90% of 
the epithelium (Supplementary Fig. 2), whereas SOX9 became localized 
to pockets of proliferating cells at the base of the villus-like protrusions 

Fig. 3d, e) similar to the intervillus epithelium of fetal mouse intestines 
at e16.5 (Fig. 3g and Supplementary Fig. 9). 5-bromodeoxyuridine 
(BrdU) pulse chase and analysis of organoids using a Z-stack series 
of confocal microscopic images showed that epithelial BrdU incorp- 
oration was largely restricted to SOX9-expressing cells in crypt-like 
structures that penetrated into the underlying mesenchyme (Sup- 
plementary Fig. 9). At 28 days, LGR5 is not expressed and ASCL2 
(ref. 21) is broadly expressed and not restricted to the SOX9* prolif- 
erative zone. However, organoids cultured until 56 days expressed both 
ASCL2 and LGRS in restricted epithelial domains that appear to over- 
lap with the SOX9* zone (Fig. 3h-j and Supplementary Fig. 10). This 
domain is similar to developing intestinal progenitor domains in vivo, 
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Figure 3 | Human ES cells and iPSCs form three-dimensional intestine-like 
organoids. a, A time course shows that intestinal organoids formed highly 
convoluted epithelial structures surrounded by mesenchyme after 13 days (d). 
b-e, Intestinal transcription factor expression (KLF5, CDX2, SOX9) and cell 
proliferation on serial sections of organoids after 14 and 28 days (serial sections 
are b and c, d and e). Ki67, nuclear proliferation antigen. Nuc, nuclei. 

f, g, Expression of KLF5, CDX2, and SOX9 in mouse fetal intestine at e14.5 
(f) and e16.5 (g) is similar to developing intestinal organoids. The right panels 
show separate colour channels for d, e and g (bracket highlights the region 
shown in the panels on the right). h-j, Whole mount in situ hybridization of 56- 
day-old organoids showing epithelial expression of SOX9 (h) and restricted 
‘crypt-like’ expression of the stem cell markers LGRS5 (i) and ASCL2 (j). Insets 
show sense controls for each probe. Scale bars, 20 um. 


which ultimately give rise to the stem cell niche in the crypt of 
Lieberkiihn’>. iPSCs were equally capable of forming intestinal progeni- 
tor domains (Supplementary Fig. 9e). Thus, PSC-derived intestinal 
epithelium continued to mature in vitro and develop proliferative 
domains with nascent intestinal stem cells. 

Between 18 and 28 days in culture, we observed cytodifferentiation 
of the stratified epithelium into a columnar epithelium containing 
brush borders and all of the major cell lineages of the gut as determined 
by immunofluorescence and quantitative polymerase chain reaction 
with reverse transcription (RT-qPCR) (Fig. 4a-d and Supplementary 
Fig. 11). By 28 days of culture, villin (Fig. 4a) and DPPIV (not shown) 
were localized to the apical surface of the polarized columnar epithe- 
lium and transmission electron microscopy revealed a brush border of 
apical microvilli indistinguishable from those found in mature intest- 
ine (Fig. 4d and Supplementary Fig. 1). Enterocytes had a functional 
peptide transport system and were able to absorb a fluorescently 
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Figure 4 | Formation and function of intestinal cell types and regulation of 
enteroendocrine differentiation by NEUROG3. a-c, Twenty-eight-day 
iPSC-derived organoids were analysed for villin (VIL) and the goblet cell 
marker mucin (MUC2) (a), the Paneth cell marker lysozyme (LYZ) (b), or the 
endocrine cell marker chromogranin A (CHGA) (c). Nuc, nuclei. d, Electron 
micrograph showing an enterocyte cell with a characteristic brush border with 
microvilli (inset). e, Epithelial uptake of the fluorescently labelled dipeptide 
d-Ala-Lys-AMCA (arrowheads) indicating a functional peptide transport 
system. f-h, Adenoviral expression of NEUROG3 (Ad-NEUROG3) causes a 
fivefold increase in CGA* cells compared to a GFP control (Ad-GFP). n = 4 
biological samples;*P = 0.005. i-k, Organoids were generated from human ES 
cells that were stably transduced with shRNA-expressing lentiviral vectors. 
Compared to control shRNA organoids, NEUROG3 shRNA organoids had a 
95% reduction in the number of CHGA* cells. 1 = 3 for shRNA controls and 
n= 5 for NEUROG3-shRNA; *P = 0.018. Scale bar in a is 10 tum; all others are 
20 pm. Error bars are s.e.m. 


labelled dipeptide (Fig. 4e)””. Cell counting revealed that the epithe- 
lium contained approximately 15% MUC2" goblet cells, which secrete 
mucins into the lumen of the organoid, 18% lysozyme-positive cells, 
which are indicative of Paneth cells, and ~1% chromogranin-A- 
expressing enteroendocrine cells (Fig. 4 and Supplementary Fig. 11g). 
MUC2 and lysozyme staining indicated that the goblet and Paneth cells 
in 28-day organoids are immature (Fig. 4a, b). However, in organoids 
that were passaged over 100 days, all cells had acquired a more mature 
phenotype and Paneth cells were often localized in crypt-like structures 
(Supplementary Fig. 12b, c). RT-qPCR confirmed the presence of 
additional markers of differentiated enterocytes (FABP2; also known 
as IFABP) and Paneth cells (MMP7) (Supplementary Fig. 11). Indivi- 
dual organoids seemed to be a mix of proximal intestine (GATA4*/ 
GATA6") and distal intestine (GATA4 /GATA6*; HOXA13- 
expressing) (Supplementary Figs 11 and 13)”. Thus, directed differ- 
entiation of PSCs into intestinal tissue in vitro is highly efficient in 
generating three-dimensional intestinal tissue containing crypt-like 
progenitor niches, villus-like domains and all of the differentiated cell 
types of the intestinal epithelium. 

Intestinal organoids contained a mesenchymal layer that developed 
along with the epithelium in a staged manner similar to embryonic 
development’’* (Supplementary Fig. 14). Mesenchyme probably 
came from the 2% of mesoderm cells that were present after activin 
differentiation, which expanded up to 10% in FGF4-treated hindgut 
cultures (Supplementary Fig. 2). At 14 days, organoids broadly 
expressed mesenchymal markers including FOXF1 and vimentin 
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(Supplementary Fig. 14), similar to an e12.5 embryonic intestine (Sup- 
plementary Fig. 7). We also observed vimentin/smooth muscle actin 
(SMA; also known as ACTA2) double-positive cells indicative of 
intestinal subepithelial myofibroblasts”. By 28 days, we observed a layer 
of SMA‘ /desmin* double-positive cells, indicating smooth muscle, 
and desmin* /vimentin”* fibroblasts”®. The fact that intestinal mesench- 
yme differentiation coincided with differentiation of the overlying 
epithelium indicates that epithelial-mesenchymal crosstalk may be 
important in the development of PSC-derived intestinal organoids. 

The molecular basis of congenital malformations in humans is often 
inferred from functional studies in model organisms. For example, 
neurogenin 3 (NEUROG3) was investigated as a candidate gene 
responsible for congenital loss of intestinal enteroendocrine cells in 
humans’* because of its known role in enteroendocrine cell develop- 
ment in mouse” *°. However, it has been impossible to directly investi- 
gate the role of NEUROG3 during human intestinal development. We 
therefore performed gain- and loss-of-function analyses to investigate 
the role of NEUROG3 during human enteroendocrine cell development 
(Fig. 4 and Supplementary Fig. 15). NEUROG3 was overexpressed in 
28-day human organoids using adenoviral (Ad)-mediated trans- 
duction*’, After 7 days, approximately 5% of cells were GFP* and 
Ad-NEUROG3-GFP-infected organoids contained fivefold more chro- 
mogranin A* endocrine cells than control organoids (Ad-enhanced 
GFP (eGFP)) (Fig. 4f-h and Supplementary Fig. 15), demonstrating 
that NEUROG3 expression is sufficient to promote an enteroendocrine 
cell fate. To knockdown endogenous NEUROG3, we generated human 
ES cell lines by transducing cells with NEUROG3 short hairpin 
(sh)RNA-expressing lentiviral vectors. NEUROG3 mRNA levels were 
knocked down by 63% and this resulted in a 90% reduction in the 
number of enteroendocrine cells (Fig. 4i-k and Supplementary Fig. 
15d-f), demonstrating that intestinal enteroendocrine cell development 
is highly dependent on NEUROG3 expression. This indicates that 
partial loss-of-function mutations in human NEUROG3 would be suf- 
ficient to cause a marked reduction in enteroendocrine cell numbers. 

This is the first report, to our knowledge, demonstrating that human 
PSCs can be efficiently directed to differentiate in vitro into human 
tissue with a three-dimensional architecture and cellular composition 
remarkably similar to the fetal intestine. Moreover, PSC-derived 
human intestinal tissue undergoes maturation in vitro, developing 
intestinal stem cells and acquiring both absorptive and secretory func- 
tionality. This system allows for functional studies to investigate the 
molecular basis of human congenital gut defects in vitro and to generate 
intestinal tissue for eventual transplantation-based therapy for diseases 
such as necrotizing enterocolitis, inflammatory bowel diseases and 
short-gut syndromes. The ability to generate human intestinal tissues 
should also greatly facilitate future studies of intestinal stem cells and 
drug design to enhance absorption and bioavailability. 


METHODS SUMMARY 

Generation of human intestinal organoids. Human ES cells and iPSCs were 
maintained on Matrigel (BD Biosciences) in mTesR1 medium without feeders. 
Differentiation into definitive endoderm was carried out as previously described". 
Briefly, a 3-day activin A (R&D systems) differentiation protocol was used. Cells 
were treated with activin A (100 ng ml ~ ) for three consecutive days in RPMI 1640 
medium (Invitrogen) with increasing concentrations of 0%, 0.2% and 2% HyClone 
defined fetal bovine serum (dFBS; Thermo Scientific). For hindgut differentiation, 
definitive endoderm cells were incubated in 2% dFBS-DMEM/F12 with 500 ng 
ml! FGF4 and 500ng ml" ' WNT3A (R&D Systems) for up to 4 days. Between 2 
and 4 days of treatment with growth factors, three-dimensional floating spheroids 
formed and were then transferred into three-dimensional cultures previously 
shown to promote intestinal growth and differentiation’*”’. Briefly, spheroids were 
embedded in Matrigel (BD Bioscience) containing 500 ng ml_' R-Spondin1 (R&D 
Systems), 100ng ml | Noggin (R&D Systems) and 50ng ml ' EGE (R&D 
Systems). After the Matrigel solidified, medium (advanced DMEM/F12; 
Invitrogen) supplemented with L-glutamine, 1011M HEPES, N2 supplement 
(R&D Systems), B27 supplement (Invitrogen), and penicillin/streptomycin- 
containing growth factors was overlaid and replaced every 4 days. 
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Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Maintenance of PSCs. Human ES cells and induced pluripotent stem cells were 
maintained on Matrigel (BD Biosciences) in mTesR1 medium***. Cells were 
passaged approximately every 4 days, depending on colony density. To passage 
PSCs, they were washed with DMEM/F12 medium (no serum) (Invitrogen) and 
incubated in DMEM/F12 with 1 mg ml‘ dispase (Invitrogen) until colony edges 
started to detach from the dish. The dish was then washed 3 times with DMEM/ 
F12 medium. After the final wash, DMEM/F12 was replaced with mTesR1. 
Colonies were scraped off of the dish with a cell scraper and gently triturated into 
small clumps and passaged onto fresh Matrigel-coated plates. 

Differentiation of PSCs into definitive endoderm. Differentiation into defini- 
tive endoderm was carried out as previously described"’. Briefly, a 3-day activin A 
(R&D systems) differentiation protocol was used. Cells were treated with activin A 
(100 ng ml ') for three consecutive days in RPMI 1640 media (Invitrogen) with 
increasing concentrations of 0%, 0.2% and 2% HyClone defined fetal bovine serum 
(dFBS; Thermo Scientific). 

Differentiation of definitive endoderm in permissive media. After differenti- 
ation into definitive endoderm, cells were incubated in DMEM/F12 plus 2% dFBS 
with either 0, 50 or 500 ng ml! FGF4 and/or 0, 50 or 500 ngml~ !WNT3A (R&D 
Systems) for 6, 48 or 96h. Cultures were then grown in permissive medium 
consisting of DMEM plus 10% FBS for an additional 7 days. 

Directed differentiation into hindgut and intestinal organoids. After differ- 
entiation into definitive endoderm, cells were incubated in 2% dFBS-DMEM/F12 
with either 50 or 500ngml' FGF4 and/or 50 or 500ngml_ ' WNT3A (R&D 
Systems) for 2-4 days. After 2 days with treatment of growth factors, three- 
dimensional floating spheroids were present in the culture. Three-dimensional 
spheroids were transferred into an in vitro system to support intestinal growth and 
differentiation previously described'*'*. Briefly, spheroids were embedded in 
Matrigel (BD Bioscience; no. 356237) containing 500ngml ' R-Spondin1 
(R&D Systems), 100ngml~* Noggin (R&D Systems) and 50ngml + EGF 
(R&D Systems). After the Matrigel solidified, medium (advanced DMEM/F12; 
Invitrogen) supplemented with L-glutamine, 10}1M HEPES, N2 supplement 
(R&D Systems), B27 supplement (Invitrogen), and _penicillin/streptomycin- 
containing growth factors was overlaid and replaced every 4 days. 

Generation and characterization of iPSC lines. Normal human skin keratino- 
cytes (HSKs) were obtained from donors with informed consent (Cincinnati 
Children’s Hospital Medical Center (CCHMC) Institutional Review Board pro- 
tocol CR1_2008-0899). Normal HSKs were isolated from punch biopsies follow- 
ing trypsinization and subsequent culture on irradiated NIH3T3 feeder cells in F 
medium”. For iPSC generation, normal HSKs were transduced on two con- 
secutive days with a 1:1:1:1 mix of recombinant RD114-pseudotyped retroviruses 
expressing OCT4, SOX2, KLF4 and MYC**** in the presence of 8 pg ml poly- 
brene. Twenty-four hours after the second transduction the virus mix was replaced 
with fresh F medium and cells were incubated for an additional three days. Cells 
were then trypsinized and seeded into 6-well dishes containing 1.875 X 10° irra- 
diated mouse fibroblasts per well and Epilife medium. On the following day, 
medium was replaced with DMEM/F12 50:50 medium supplemented with 20% 
knockout serum replacement, 1 mM L-glutamine, 0.1 mM B-mercaptoethanol, 1X 
non-essential amino acids, 4ng ml ! basic fibroblast growth factor, and 0.5 mM 
valproic acid. Morphologically identifiable iPSC colonies arose after 2-3 weeks 
and were picked manually, expanded and analysed for expression of human PSC 
markers NANOG, DNMT3B, and using the antigen antibodies Tral-60 and 
Tral-81°’**. Early passage iPSC lines were adapted to feeder-free culture condi- 
tions consisting of maintenance in mTeSR1 (Stem Cell Technologies) in culture 
dishes coated with Matrigel (BD Biosciences) and lines were karyotyped. 
Microarray analysis of human ES cells, iPSCs and definitive endoderm cul- 
tures. For microarray analysis, RNA was isolated from undifferentiated and 3-day 
activin-treated human ES cell and iPSC cultures and used to create target DNA for 
hybridization to Affymetrix Human 1.0 Gene ST Arrays using standard proce- 
dures (Affymetrix). Independent biological triplicates were performed for each cell 
line and condition. Affymetrix microarray Cel files were subjected to RMA nor- 
malization in GeneSpring 10.1. Probe sets were first filtered for those that are 
overexpressed or underexpressed and then subjected to statistical analysis for 
differential expression by 2 fold or more between undifferentiated and differen- 
tiated cultures with P < 0.05 using the Students t-test. Log2 gene expression ratios 


were then subjected to hierarchical clustering using the standard correlation dis- 
tance metric as implemented in GeneSpring. 

Adenoviral-mediated expression of NEUROG3. Adenoviral plasmids were 
obtained from Addgene and particles were generated as previously described"'. 
Transduction was done on 28-day organoids that were removed from Matrigel, 
manually bisected then incubated in Ad-GFP or Ad-NEUROG3 viral supernatant 
and medium ata 1:1 ratio for approximately 4 h. Organoids were then re-embedded 
in Matrigel and incubated overnight with viral supernatant and medium at a 1:1 
ratio. The next day, fresh organoid medium was placed on the cultures and was 
changed as described until the end of the experiment. 

shRNA knockdown human ES cell lines. GipZ shRNA lentiviral vectors were 
obtained from Open Biosystems (GipZ-NEUROG3 Open Biosystems clone no. 
v2lhs_309089; v2lhs_309091; v2lhs_309093; v2lhs_309092 and GipZ-Control; 
Openbiosystems clone no. RHS4346). The CCHMC Viral Vector Core produced 
high-titre lentiviral particles for each plasmid. Low-passage H9 human ES cells 
were dissociated into a single-cell suspension using Accutase, were spun down and 
resuspended in mTesRI containing 10 uM Y-27632. Cells were plated at low density 
and incubated with lentivirus for 24h. For the NEUROG3 shRNA knockdown line, 
particles from all four vectors were used. mTesR1 was replaced daily, and after 72 h 
selection for puromycin- (2-4 1g ml” ') resistant human ES cells was carried out. 
Puromycin-resistant colonies were routinely maintained and passaged in 
mTesR1+puromycin (4g ml” '). 

B-Ala-Lys-AMCA uptake. $-Ala-Lys-AMCA was purchased from BioTrend 
Chemicals and was resuspended in water. Intestinal organoids were cut in half using 
a scalpel and were incubated for four hours in advanced DMEM/F12 plus 24 1M 
B-Ala-Lys-AMCA. Following incubation, organoids were washed several times in 
PBS, embedded in OCT freezing medium and were frozen at —70 °C. Ten-micrometre 
cryosections were cut and processed for standard immunohistochemistry. 

Tissue processing, immunohistochemistry and microscopy. Tissues were fixed 
for 1h to overnight in 4% paraformaldehyde or 3% glutaraldehyde for transmis- 
sion electron microscopy (TEM). Cultured PSCs and definitive endoderm cells 
were stained directly. Hindgut and intestinal organoids were embedded in paraffin, 
epoxy resin LX-112 (Ladd Research), or frozen in OCT. Sections were cut at 
6-10 tm for standard microscopy and 0.1 jum for TEM. TEM sections were stained 
with uranyl acetate. Paraffin sections were deparaffinized, subjected to antigen 
retrieval, blocked in the appropriate serum (5% serum in 1X PBS plus 0.5% 
Triton-X) for 30 min, and incubated with primary antibody overnight at 4°C. 
Slides were washed and incubated in secondary antibody in blocking buffer for 
2h at room temperature (23 °C). For a list of antibodies used and dilutions, see 
Supplementary Table 3. Slides were washed and mounted using Fluormount-G. 
Confocal images were captured on a Zeiss LSM510 and Z-stacks were analysed and 
assembled using AxioVision software. An Hitachi H7600 transmission electron 
microscope was used to capture images. 

RNA isolation, RT-qPCR. RNA was isolated using the Nucleospin II RNA isola- 
tion kit (Clonetech). Reverse transcription was carried out using the SuperScriptIII 
Supermix (Invitrogen) according to manufacturer’s protocol. Finally, qPCR was 
carried out using Quantitect SybrGreen MasterMix (Qiagen) on a Chromo4 Real- 
Time PCR (BioRad). PCR primers sequences were typically obtained from 
qPrimerDepot (http://primerdepot.nci.nih.gov/). Primer sequences are available 
upon request. 
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The principal immune mechanism against biotrophic pathogens in 
plants is the resistance (R)-gene-mediated defence’. It was pro- 
posed to share components with the broad-spectrum basal defence 
machinery’. However, the underlying molecular mechanism is 
largely unknown. Here we report the identification of novel genes 
involved in R-gene-mediated resistance against downy mildew in 
Arabidopsis and their regulatory control by the circadian regulator, 
CIRCADIAN CLOCK-ASSOCIATED 1 (CCA1). Numerical clus- 
tering based on phenotypes of these gene mutants revealed that 
programmed cell death (PCD) is the major contributor to resist- 
ance. Mutants compromised in the R-gene-mediated PCD were also 
defective in basal resistance, establishing an interconnection 
between these two distinct defence mechanisms. Surprisingly, we 
found that these new defence genes are under circadian control by 
CCA1, allowing plants to ‘anticipate’ infection at dawn when the 
pathogen normally disperses the spores and time immune responses 
according to the perception of different pathogenic signals upon 
infection. Temporal control of the defence genes by CCA1 differ- 
entiates their involvement in basal and R-gene-mediated defence. 
Our study has revealed a key functional link between the circadian 
clock and plant immunity. 

The life cycles of biotrophic pathogens of plants are intimately 
linked with host metabolism controlled by the cycle of day and night. 
Hence, their interactions with the host may be dictated by the circadian 
clock. This is especially likely as plants do not have specialized immune 
cells and their immune responses have to be finely balanced with other 
cellular functions. However, a link between the circadian clock and 
plant defence has never been firmly established’. 

The major plant defence strategy against biotrophic pathogens is 
resistance (R)-gene-mediated immunity. Detection of a pathogen- 
encoded virulence effector by the R protein triggers programmed cell 
death (PCD) and several other physiological responses collectively 
known as the hypersensitive response*. The effector-specific R-gene- 
mediated resistance may share components with the broad-spectrum 
basal defence machinery’. But how these components are differentially 
regulated is still unclear. 

We chose to study resistance against Hyaloperonospora arabidopsi- 
dis (Hpa) because this obligate biotrophic oomycete pathogen causes 
downy mildew disease on Arabidopsis leaves through clearly defined 
infection steps’, allowing better dissection of the corresponding resist- 
ance mechanisms blocking these steps. The Arabidopsis Columbia 
(Col-0) accession is resistant to the Hpa Emwal isolate due to the 
presence of the R gene, RPP4 (ref. 6). We performed a time-course 
expression profiling of wild type and rpp4 (Supplementary Fig. 1) in 
response to Hpa Emwal infection. Based on the phenotype develop- 
ment and defence marker gene expression (Supplementary Fig. 2a, b), 
we identified 106 genes differentially expressed in wild type and rpp4 at 


2 days post inoculation (dpi) (Supplementary Fig. 2c). These candidate 
genes were induced earlier than the previously reported immune regu- 
lators including EDS5, PAD4, PBS3, ICS1, NDR1 and EDS1 (ref. 7), 
which were known to function downstream of R gene activation. 

We inoculated the T-DNA insertion mutants (from the ABRC and 
NASC Stock Centres) of these 106 candidate genes with Hpa Emwal 
and identified 22 mutants that displayed enhanced susceptibility com- 
pared to wild type based on sporangiophore growth and other disease 
symptoms (for example, chlorosis) by microscopic inspection. For 
most of the 22 genes, at least two homozygous mutant T-DNA alleles 
were tested (Supplementary Fig. 3 and Supplementary Tables 1 and 2). 

To identify specific resistance defects in the mutants, we stained the 
infected plants with lactophenol trypan blue (LTB) 7 dpi and scored 
for the occurrence of the seven phenotypes represented in Supplemen- 
tary Fig. 4a. As shown in Supplementary Fig. 4b, the rpp4 mutant had 
the highest percentage of leaves with sporangiophores (SPP), confirm- 
ing that its resistance to Hpa Emwal is completely compromised as 
SPP indicates completion of the infection cycle. Wild type had the 
highest score of discrete hypersensitive response (DIH), which was 
defined by the small cluster of infected host cells that underwent 
PCD, a phenotype associated with R-gene-mediated resistance. The 
phenotype scores are also presented numerically in Fig. la and the 
mutants are ranked on the basis of their SPP scores. 

Hierarchical clustering of the mutants using their phenotype scores 
(Fig. la) put these 22 gene mutants into two groups (Fig. 1b). Similar 
groupings are also obtained with data from three biological replicates 
(Supplementary Fig. 5). Eigenvectors derived from principal component 
analysis indicate that 63.8% of the phenotype variations could be 
accounted for by PC1 with indicators of resistance, DIH and expanding 
hypersensitive response (EXH), as positive contributors and disease 
phenotypes, SPP and free hypha (FRH), as negative contributors 
(Fig. 1c). If PC2 was also considered, six out of the seven phenotypes 
had significant contributions. 

The Group 1 mutants (red numbers in Fig. 1b and Supplementary 
Fig. 5) seem to be defective in R-gene-mediated PCD (low DIH and 
EXH scores) with high disease symptoms (FRH and SPP). In contrast, 
the Group 2 mutants (blue numbers) appeared to be intact in PCD with 
high EXH and DIH scores and milder symptoms (low FRH and SPP 
scores). To determine the resistance defects in Group 2 mutants, we 
examined them for other R-gene-mediated physiological responses, 
such as accumulation of phenolic compounds involved in cell wall 
strengthening against pathogen penetration and deposition of callose 
after Hpa Emwal inoculation. We found that Mutant 16 (tyrosine 
aminotransferase 3) was defective in phenolic compound accumula- 
tion at the site of pathogen penetration (Supplementary Fig. 6a) 
and Mutants 12 (lipase class 3 family protein) and 14 (cyclic nucleo- 
tide gated channel 3) showed a deficiency in callose deposition in 
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a b 
No. | Description PP|TRN|OOS|FRH|FHI |EXH|DIH 1.2 
RPP4 | RPP4 98* | 55* | 73* [100*/25* | 8* | 3* ee 
1 | LAR-kinase 78*|10*| 88* | 88* 20*| 5* gm 0.4 5 
9 | Calcineurin-like 78° |15* [190*|100" ee lier) Gi 4-68 ie 8 
phosphoesterase ae Pea i ete E 
3 | ADR1-like1 80* | 95* 3* o 
4 | Aspartic protease 38 68* | 13*| 95* | 90" 18*| 3* 
5 UE eee transferase 65* | 18*| 95* | 90" a 
6 | Ankyrin 65* | 13*| 93* | 95* [83* | 28* [10* 
7 Diseeee paleiaie’: piste 65* | 15* | 90* | 80* 45* |43* iS 
8 | Protein kinase family protein 95* | 73* |95* | 15* }25* g is 
9 ABC transporter family 58°! 0 | 93* | 93* los* ola 5 
protein 218 
10 ATPase-like domain 55*|10*| 6o* | 8 3 g % 
containing protein z1§ 
11 | Ankyrin repeat protein 85* | 90* 43* |28* 2 
12 | Lipid class 3 family protein [55*]23*]100*| 55* 58 |13* 
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Figure 1 | Phenotypic analyses discovered two distinct RPP4-mediated 
resistance responses against Hpa Emwal. a, Phenotype scores (percentage in 
40 leaves per genotype). SPP, sporangiophore; TRN, trailing necrosis; OOS, 
oospore; FRH, free hypha; FHI, free hyphal intermediate; EXH, expanding 
hypersensitive response; DIH, discrete hypersensitive response. *P < 0.05. 

b, Mutants were clustered on the basis of their phenotype scores in Fig. 1a. 


response to Hpa Emwal similar to that observed in rpp4 (Supplemen- 
tary Fig. 6b). 

Collectively, these observations suggest that RPP4 regulates at least 
two separate responses (Fig. 1d): Group 1 genes are required for 
R-mediated PCD, as mutations in these genes led to low DIH and 
EXH scores and formation of FRH, TRN, and SPP. The Group 2 genes 
are probably involved in defence responses other than PCD, such as 
callose deposition and phenolic compound accumulation. Loss of 
these latter functions resulted in pathogen penetration even in the 
presence of PCD. One important conclusion from these data is that 
PCD is the predominant resistance response against Hpa Emwal 
because the Group 1 mutants were more susceptible, on the basis of 
the SPP scores (except Mutant 12), than the Group 2 mutants (Fig. 1a, 
b). This is supported by the eigenvector composition where PC1 (DIH 
and EXH) was the major contributor to the phenotypic variations 
(Fig. 1c). This finding is consistent with the fact that Hpa Emwal is 
an obligate biotrophic pathogen. Suicidal death of the host cell means 
the end of the pathogen life cycle. The functional diversity of the Group 
1 genes indicates that RPP4-mediated PCD is orchestrated by changes 
in multiple biological processes, rather than a single triggering event. 

We also subjected the 22 defence gene mutants to infection by the 
virulent isolate, Hpa Noco2, to which a cognate R gene is absent in Col-0. 
We found that 10 of the mutants displayed significantly enhanced 
disease susceptibility (Fig. 2a) demonstrating that these defence genes 
are involved in both R gene-specific PCD and general basal resistance. 
To determine whether the observed defect in Hpa Emwal resistance is 
RPP4-specific or due to compromised basal defence, we infected the 
mutants with Hpa isolates, Cala2 and Hiks1, which are known to have 


Second allele, “A’. Group 1, red; Group 2, blue. au, Approximately unbiased 
P-values (0-100%, the higher the number the more significant). c, Eigenvectors 
derived from PCA. The percentage of phenotypic variations captured by each 
PC is shown. d, A diagram showing that the Group 1 mutants are defective in 
RPP4-mediated PCD, whereas the Group 2 mutants are compromised in 
formation of physical/chemical barriers with intact PCD. 


cognate R genes, RPP2 and RPP7, respectively, in the Col-0 back- 
ground’”*®. None of the mutants showed compromised resistance 
(Fig. 2b and Supplementary Table 1) indicating that the deficiency in 
resistance against Hpa Emwal is RPP4 gene-specific. However, a few of 
the mutants did show defects in RPS2-mediated resistance to a bacterial 
pathogen Pseudomonas syringae pv. maculicola ES4326 carrying 
AvrRpt2 (Psm ES 4326/AvrRpt2) (Supplementary Fig. 7a). Three of 
them were also hypersusceptible to Psm ES4326 in the absence of the 
AvrRpt2 signal (Supplementary Fig. 7b). 

We next subjected all of the 22 mutants to microbial-associated 
molecular pattern (MAMP) treatments including EF-Tu (elf18) and 
flagellin (flg22) to examine the interconnection between R-gene- 
mediated resistance and MAMP-triggered basal immunity. We found 
that Mutant 1, mutated in the leucine-rich repeat receptor-like kinase 
(LRR-RLK; AT1G35710), was insensitive to elf18 (Fig. 2c, d). Because 
Mutant | also showed the highest level of susceptibility to Hpa Emwal 
(Fig. 1a), we propose that this LRR-RLK is a link between MAMP- 
signalling and RPP4-mediated PCD and resistance. Although MAMP- 
triggered immunity is not typically associated with PCD, MAMP 
signalling components have been implicated previously in PCD. 
Mutation of the Arabidopsis BRI1-associated receptor kinase 1 (BAK1), 
which is a MAMP-coreceptor required for responses to elf18 and 
flg22 (ref. 9), was shown to cause spreading necrosis upon pathogen 
challenge, indicating that BAK1 is an inhibitor of PCD'®. However, 
silencing BAK1 in Nicotiana benthamiana blocked the cell death 
induced by the oomycete elicitor INF1 (ref. 11), consistent with our 
finding that a MAMP signalling component is involved in PCD 
resistance against oomycete infection. 
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Figure 2 | Some of the RPP4-mediated resistance mutants are also 
compromised in basal defence. a, Enhanced disease susceptibility to Hpa 
Noco2 based on sporangiospore count 7 dpi (n = 3). b, Summary of the 
infection tests on the 22 defence gene mutants (22 mts) using different Hpa 
isolates. S, susceptible; R, resistant; EDS, enhanced disease susceptibility. c, Root 
length measurements 9 days after elf18 treatment (n = 3). efr, efl18 receptor 
mutant. d, Fresh weight measurements 6 days after elf18 treatment (n = 3). 
*P <0.05, **P< 0.01, ***P < 0.001. 


Our genetic data showed that R-mediated resistance and basal 
defence share common components. This raises the question of how 
activation of similar sets of genes causes PCD in RPP4-specific resist- 
ance against Hpa Emwa| and non-specific basal resistance against Hpa 
Noco2. To understand the differential regulation of these immune 
mechanisms, we analysed the promoter regions of these 22 genes. 
Using the Athena program (http://www.bioinformatics2.wsu.edu/ 
Athena/), we found significant enrichment of the “evening element’, 
which is regulated both positively and negatively by the circadian 
regulator, CCA1 (refs 12, 13). Further examination showed that 14 
of the 22 genes contain either evening element and/or the CCA1- 
binding site and/or have rhythmic expression patterns (Fig. 3a)'*. 
Interestingly, the promoter region of RPP4 also contains two evening 
elements and its expression shows a circadian rhythm. 

To confirm the involvement of the circadian clock in defence, we 
first examined the responses of clock mutants to Hpa Emwal. The 
infection was carried out at dawn, the time when Hpa spores are 
normally disseminated in nature'*. The ccal mutant (Salk_067780) 
and ztl-4 (a mutant of ZEITLUPE)'* showed compromised resistance 
whereas a CCA1-overexpression line (CCAlo,)'” showed enhanced 
resistance (Fig. 3b). Surprisingly, /hy, the mutant of the CCA1 homo- 
logue, LATE AND ELONGATED HYPOCOTYL (LHY)"* responded as 
wild type. 

We next examined the expression patterns of all 22 defence genes in 
wild type, rpp4 and ccal every 2h in a 46-h time-course, with and 
without infection by Hpa Emwal. Because of the large number of 
samples involved, we used the innovative high throughout RNA 
annealing selection ligation-sequencing (RASL-seq) technology” for 
expression analysis (Supplementary Table 3 and Methods). 

As shown in Fig. 3c, consistent with the genetic data, the rhythmic 
expression of LHY was not significantly perturbed by infection in 
either wild type or rpp4. This indicates that RPP4-mediated defence 
does not disrupt the overall running of the clock, but rather engages 
CCAL1. This specific sensitivity of CCA1 to infection conditions was 
confirmed using a transgenic line expressing CCA1:LUC (ref. 16, Sup- 
plementary Fig. 8a). To eliminate the effects of light changes on CCA1 
expression, we also performed infection in transgenic plants carrying 
the CCA1:LUC and LHY:LUC reporters’* under the free-running light 
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cycles (Supplementary Fig. 8b). Similar to the RASL-seq results, 
LHY:LUC expression remained unchanged, whereas CCA1:LUC 
expression was significantly induced and became arrhythmic upon 
Hpa Emwa1 challenge. 

Conveniently, the stable expression pattern of LHY served as an 
internal control for the quality of RNA preparations and RASL-seq. 
Based on the non-negative matrix factorization (NMF) algorithm”®, 
the 22 RPP4-regulated genes fit best into two clusters (Fig. 3a and 
Supplementary Figs 9 and 10). The membership distance of each gene 
to its cluster is illustrated by the circle radius in Supplementary Fig. 1 1a. 
These two clusters corresponded roughly to the two phenotypic groups 
determined through genetic analysis (Figs 1 and 3a). Most of the 
Cluster 1 genes containing evening element in their promoters are 
involved in R-gene-mediated PCD and were therefore the focus of 
further concern (Fig. 3c). The expression patterns of the Cluster 2 genes 
are shown in Supplementary Fig. 11b. 

Consistent with the fact that evening element is enriched in the 
Cluster 1 gene promoters (Fig. 3a), the weighted mean expression of 
these genes largely overlaps with the expression patterns of CCAI 
(Fig. 3c). In wild-type control (Col CK), Cluster 1 genes showed a 
rhythmic expression pattern with a single sharp peak every evening. 
In ccal (ccal CK), the expression peaks were greatly diminished, con- 
firming that CCA1 is an activator of these defence genes. 

The rhythmic expression of the defence genes in the absence of 
pathogen indicates that plants are programmed to ‘anticipate’ infection 
according to a circadian schedule. The CCA1-mediated pulse expres- 
sion of the defence genes coincides with the time of Hpa sporulation 
which mainly occurs at night and the time of spore dissemination 
which takes place at dawn’>. To test this, we performed Hpa Emwal 
infection not only at the normal ‘dawn’ infection time but also at ‘dusk’. 
We found that ifthe plants were inoculated at dusk, when infection was 
unexpected, significantly higher levels of susceptibility were observed 
in both wild type and rpp4 (Fig. 3d). CCA1 clearly has a role in con- 
ferring resistance at dawn because in ccal, more Hpa Emwal growth 
was observed compared to wild type. However, no further increase in 
susceptibility was observed in ccal if inoculation was carried at dusk 
because CCAI and the CCAlI-regulated defence genes are not 
expressed at this time. 

In response to Hpa Emwal infection, Cluster 1 genes showed 
drastically different expression patterns in wild type (Col EMWA1) 
and rpp4 (rpp4 EMWA1) (Fig. 3c). Without RPP4, the expression of 
the defence genes peaked at the 6-, 16- and 24-h time points which 
coincided with the expected time of Hpa spore germination, formation 
of penetration hyphae and establishment of primary haustoria in 
mesophyll cells, respectively’. This pattern of expression may explain 
how these defence genes contribute to the basal resistance against Hpa 
infection. Consistent with CCA1 having a role in basal resistance, 
CCAlog was more resistant to Hpa Noco2 than wild type (Sup- 
plementary Fig. 12). However, understanding the signalling events 
between the pathogen and CCAI leading to this specific timing of 
defence gene expression will require future research. 

In the presence of RPP4, the 6-h expression peak was diminished 
(Col EMWAI, Fig. 3c). The subsequent perception of the pathogen 
effector by RPP4 led to the gradual and sustained expression of defence 
genes. We propose that the prolonged expression of these defence 
genes, which are normally pulse-expressed at dawn, results in PCD 
of the infected host cells and pathogen resistance. Consistent with it 
being a key positive regulator of Cluster 1 defence genes involved in 
RPP4-mediated PCD, knocking out CCA1 function significantly lowered 
the average DIH score (a measure of host cell death) in ccal after Hpa 
Emwal infection (Fig. 3e). 

How RPP4 interacts with CCA1 to control the defence gene expres- 
sion requires further investigation. The spatial resolution of the expres- 
sion data from the Hpa Emwal-infected samples, homogenized from 
both infected and uninfected cells, was not enough to allow detailed 
dissection of the contribution of RPP4, CCA1 and unknown pathogenic 
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Figure 3 | The circadian regulator, CCA1, controls the defence gene Emwal inoculated. White bars, day; black bars, night. d, SPP count after Hpa 
expression and the timing of immune responses. a, Enrichment of evening | Emwal infection at dawn or dusk (n = 3). e, Occurrence of DIH 7 dpi by Hpa 
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of NMF Cluster 1 genes. CI, confidence interval; CK, control; EMWA1, Hpa 
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signals. Nevertheless, RPP4 clearly is not only a target gene of CCA1, 
but also a partner of CCA1 in regulating the defence genes as their 
patterns of expression were disturbed in rpp4 (Fig. 3c and Supplemen- 
tary Fig. 2c). 

Establishment of a molecular link between the plant circadian clock 
and R-mediated defence reveals a new interface between the plant host 
and biotrophic pathogens. Although the interactions between R genes 
and the circadian clock have yet to be studied genetically and at the 
molecular level, this study indicates a central role of the circadian clock 
in balancing growth and defence. As summarized in Fig. 3f, we hypo- 
thesize that the Cluster 1 genes are pulse-expressed to minimize 
adverse effects to the host in anticipation of infection under normal 
conditions and during basal defence. In contrast, detection of a patho- 
genic effector by the R protein may disrupt this control, leading to PCD 
of the infected cell and R-mediated resistance, which is a much stron- 
ger and signal-specific immune response. There is also an increasing 
body of evidence indicating that animal immune response is influ- 
enced by the circadian clock’. Understanding the molecular link 
between the circadian clock and immunity therefore has broad impli- 
cations in biology. 


METHODS SUMMARY 


Hyaloperonospora arabidopsidis (Hpa) propagation and inoculation were per- 
formed as described***. Ten-day-old plants were inoculated with the asexual 
spores suspension (5 X 10° spores per ml) of Hpa. Unless specified, the Hpa 
infection was always performed at dawn of the growth chamber’s photoperiod. 
Hpa Emwal-inoculated samples were collected at 0, 0.5, 2 and 4 days post inocu- 
lation (dpi). ATH1 GeneChip (Affymetrix) was used for microarray. The arrays 
were normalized and analysed as described previously~’. Disease phenotypes were 
scored after trypan blue staining at 7 dpi**. Significance of the phenotypic scores 
was determined based on binomial distribution. Disease phenotypic analysis was 
performed using hierarchical clustering with distance measured by the standard 
correlation (average linkage; scale 0-1). The significance of the clustering (boot- 
strap 100,000 times) was measured by the approximately unbiased P-values 
(0-100%, the higher the number the more significant*’). Callose deposition was 
detected after aniline blue staining’*. Accumulation of phenolic compounds was 
examined under ultraviolet illumination (Leica). Root length and fresh weight 
assays for elf18 sensitivity were performed as described previously’. The evening 
element enrichment was determined based on hypergeometric distribution. 
Samples for RASL-seq were prepared according to ref. 19. Non-negative matrix 
factorization algorithm was used to cluster the genes*®. RNA extraction was per- 
formed as described previously”. cDNA synthesis (SuperScript III, Invitrogen) 
and quantitative PCR (SYBR Green, Qiagen) were performed according to the 
manufacturer’s protocols. For Pseudomonas infection, 4-week-old plants were 
inoculated with 10 mM MgCl, or Pseudomonas syringae maculicola ES4326 with 
or without the effector AvrRpt2 (ODg¢o9 = 0.001). The in planta bacterial growth 
was measured at 3 dpi. For diurnal luciferase measurement, protein was extracted 
and bioluminescence intensity was measured using the Luciferase Assay System 
(Promega) according to manufacturer’s protocol. Ten-day-old plate-grown plants 
were used for free-running test (details in Methods). 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Arabidopsis and Hyaloperonospora arabidopsidis (Hpa) growth conditions. 
Arabidopsis seedlings were grown for 10 days at 16-18 °C, 12-h day length, 80-100% 
relative humidity before Hpa infection through spray of a spore suspension (5 X 10° 
spores per ml in distilled H,O) at dawn according to the photoperiod of the plant 
growth chamber. Hpa Emwal and Hpa Noco2 were subcultured and inocula pre- 
pared using methods modified from previous reports*”. 

RNA extraction and quantitative PCR analysis. RNA extraction was performed 
as described previously’. CDNA synthesis (SuperScript III, Invitrogen) and quanti- 
tative PCR (SYBR Green PCR kit, Qiagen) were performed according to the manu- 
facturer’s protocols. 

Microarray. Ten-day-old wild-type and rpp4 seedlings were inoculated with Hpa 
Emwal and samples were collected at 0, 0.5, 2 and 4 days after inoculation. Total 
RNA was isolated from the frozen material using the Qiagen RNeasy kit. RNA 
probes were labelled using the GeneChip Eukaryotic Small Sample Target 
Labelling Assay Version II and hybridized on the Affymetrix ATH1 GeneChip 
(Santa Clara). Two biological replicates were performed. The data presented in this 
publication have been deposited in NCBI’s Gene Expression Omnibus and are 
accessible through GEO Series accession number GSE22274 (http://www.ncbi. 
nlm.nih.gov/geo/query/acc.cgi?acc= GSE22274). 

Normalization and mixed-model analysis. The mixed-model software used to 
normalize globally all arrays and to identify differentially expressed probe sets was 
as described previously”*. Expression indices were used to calculate P- and q-values 
for pairwise comparisons of all probe sets across all treatments. R* values for the 
CEL files are as follows (Col_0d: 1 vs 2(0.98); Col_0.5d: 1 vs 2(0.97); Col_2d: 1 vs 
2(0.98); Col_4d: 1 vs 2(0.96); rpp4_0d, 1 vs 2 (0.98); rpp4_0.5d, 1 vs 2 (0.97); 
rpp4_2d, 1 vs 2 (0.98); rpp4_4d, 1 vs 2 (0.97)). 

Phenotyping mutants in response to Hpa infection. Seven days after inocu- 
lation with Hpa Emwal infection, phenotypes were scored following lactophenol 
trypan blue staining”. Leaves were vacuum-infiltrated twice in a solution of phenol, 
lactic acid, glycerol and water (1:1:1:1) plus 2.5mg ml‘ trypan blue. The tubes 
containing the samples were placed in a boiling water bath for 2 min and allowed to 
cool for overnight. The leaves were destained in the chloral hydrate solution and 
then treated with 70% glycerol. Whole leaves were analysed and photographed with 
a MZ8 stereo microscope (Leica) and a PM-C35 camera (Olympus). Detailed 
examination of Hpa structures was conducted with an Olympus BX60F compound 
microscope and differential interference contrast (DIC) optics. Leaves were stained 
for callose as described with modifications”. To visualize callose, leaves were cleared 
in a solution of ethanol and acetic acid (3:1), stained with 0.02% aniline blue in 
100mM sodium phosphate buffer (pH9) for 1h, and examined with an Axio 
imager wide field fluorescence microscope (Zeiss). To detect phenolic compounds, 
leaves were examined under ultraviolet fluorescent illumination (Leica DMRB). To 
measure Hpa Noco2 infection, infected leaves were collected in 1 ml water, and 
sporangiospores were counted. 

elf18 and flg22 treatment. Wild-type Arabidopsis (Col-0), efr and candidate 
mutant plants were grown on MS+1% sucrose medium (pH5.7) plate with 
1M elf18 or MS alone as a control under continuous light for 9 days for root 
length assay’. For fresh weight assay, plants were grown on MS+1% sucrose 
medium (pH 5.7) plate for 4 days (16/8 light/dark cycle), transferred into water 
containing 50nM elf18 or water as a control, and fresh weight was measured 
6 days after elf18 treatment. flg22 (10 nM) was infiltrated into 3-week-old plants 
for the induction of callose deposition. 

Promoter element analysis. The statistical significance of over-represented tran- 
scription factor (TF) binding elements was calculated using a hypergeometric 
probability model. The following equation was used to provide the P-values: 
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Nis the total number of promoters in the genome, n is the number of promoters 
in the genome containing the specified TF-binding element, m is the size of the 
selected set of promoters, and x is the number of promoters with the specified 
element in the selected set. Because multiple hypotheses were tested in the analysis, 
the Bonferroni correction was used. The genome-wide occurrences of these ele- 
ments in the promoters are used as controls. 

RASL-seq. The growth conditions (12/12 light/dark cycle, 16-18 °C, 80-100% 
humidity), which were optimized for Hpa infection, were different from those used 
in traditional circadian studies'*’*. Samples were collected every 2 h after inoculation 
and the remaining plants were kept to ensure successful pathogen inoculation. Total 
RNA for each sample (1 }1g) was used for RASL-seq. Primer (gene-specific with 
flanking 5’ or 3’ universal sequences) annealing to mRNA and ligation were carried 
out according to ref. 19. Bar-coded primers were then added to each sample to 
convert the ligated products to individual libraries, which were pooled from all 
samples and subjected to multiplex sequencing using Solexa GAII (Illumina). 
RASL-seq data analysis. The readings from RASL-seq were assumed Poisson 
distribution. Only those samples with mean readings significantly above zero 
(Pr (mean = 0) < 0.01) were considered for further analysis. The reading for each 
sample was first divided by the corresponding reading of control, ubiquitin 5 
(UBQ5; AT3G62250), and then standardized. The resulting matrix was used for 
clustering analysis. 

Non-negative matrix factorization (NMEF) algorithm” was used to cluster the genes. 
The number of the clusters was determined by comparing the cophenetic correlation 
coefficient for a range of cluster numbers (from 2 to 22). The cophenetic correlation 
coefficient is a measurement ofhow faithfully the result of NMF clustering preserves the 
pairwise distances between the original data points. As shown in Supplementary Fig. 9, 
two clusters generated the highest cophenetic correlation coefficient, which means two 
clusters can reflect the original data more faithfully than more clusters. Divergence was 
used as the update rule and cost measurement. Minimum of the data was subtracted 
from the data matrix to ensure that there were no negative numbers in the matrix. 
Because the NMF algorithm iteratively updates the decomposition of the data matrix, 
300 runs with 10,000 iterations/run were performed to reach the convergence 
(Supplementary Fig. 10). The membership indicators from NMF clustering were used 
as weights to calculate the weighted mean expression pattern shown in Fig. 3c. The 
weights were also used to determine the radii of circles in Supplementary Fig. 11a. 
Smaller radius indicates a higher membership of the gene to the corresponding cluster. 
Bioluminescence detection. Protein was extracted and bioluminescence intensity 
measured using the Luciferase Assay System (Promega) according to the manufac- 
turer’s manual. A Victor3 (PerkinElmer) multilabel reader was used to detect the 
bioluminescence. Substrate (100 ul) was added using an automatic injector. After 3 s 
shaking, 2 s delay, the signal was captured for 20 s. Log) transformation was performed 
to the raw signals to ensure the normal distribution of the data. After subtraction of the 
blank, the data were normalized according to the total protein concentrations deter- 
mined by the Bradford method (Bio-Rad). The resulting data were then standardized. 
Free-running test. Seeds were sterilized in 2% Plant Preservative Mixture (PPM, 
Plant Cell Technology) in the dark at 4 °C for 4 days before plating on MS plate (3% 
sucrose, 1.5% agar) and grown in a 12/12 h light/dark growth chamber for 9 days. 
At the dawn and the dusk of the ninth day, 2.5 mM luciferin in 0.05% Triton-X 100 
was sprayed onto the seedlings. At the dawn of the tenth day, the seedlings were 
treated by distilled HO or Hpa Emwal before being placed in a constant light 
chemiluminescence box. The bioluminescence signals were captured by CCD. 
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The activated B-cell-like (ABC) subtype of diffuse large B-cell 
lymphoma (DLBCL) remains the least curable form of this malig- 
nancy despite recent advances in therapy’. Constitutive nuclear 
factor (NF)-kB and JAK kinase signalling promotes malignant cell 
survival in these lymphomas, but the genetic basis for this signalling 
is incompletely understood. Here we describe the dependence of 
ABC DLBCLs on MYD88, an adaptor protein that mediates toll 
and interleukin (IL)-1 receptor signalling”’, and the discovery of 
highly recurrent oncogenic mutations affecting MYD88 in ABC 
DLBCL tumours. RNA interference screening revealed that 
MYD88 and the associated kinases IRAK1 and IRAK4 are essential 
for ABC DLBCL survival. High-throughput RNA resequencing 
uncovered MYD88 mutations in ABC DLBCL lines. Notably, 29% 
of ABC DLBCL tumours harboured the same amino acid sub- 
stitution, L265P, in the MYD88 Toll/IL-1 receptor (TIR) domain 
at an evolutionarily invariant residue in its hydrophobic core. This 
mutation was rare or absent in other DLBCL subtypes and Burkitt's 
lymphoma, but was observed in 9% of mucosa-associated lymphoid 
tissue lymphomas. At a lower frequency, additional mutations were 
observed in the MYD88 TIR domain, occurring in both the ABC 
and germinal centre B-cell-like (GCB) DLBCL subtypes. Survival of 
ABC DLBCL cells bearing the L265P mutation was sustained by the 
mutant but not the wild-type MYD88 isoform, demonstrating that 
L265P is a gain-of-function driver mutation. The L265P mutant 
promoted cell survival by spontaneously assembling a protein 
complex containing IRAK1 and IRAK4, leading to IRAK4 kinase 
activity, IRAK1 phosphorylation, NF-«B signalling, JAK kinase 
activation of STAT3, and secretion of IL-6, IL-10 and interferon-f. 
Hence, the MYD88 signalling pathway is integral to the pathogenesis 
of ABC DLBCL, supporting the development of inhibitors of IRAK4 
kinase and other components of this pathway for the treatment of 
tumours bearing oncogenic MYD88 mutations. 

The current molecular taxonomy of DLBCL distinguishes three 
main subtypes: ABC, GCB and primary mediastinal B-cell lymphoma 
(PMBL)*. Current therapy is least successful in ABC DLBCL, achieving 
less than a 40% cure rate’. The anti-apoptotic NF-«B signalling path- 
way is constitutively active in ABC DLBCL owing to oncogenic 
CARD11 mutations or chronic active B-cell receptor signalling, aug- 
mented by inactivation of A20°*. A subset of ABC DLBCLs use JAK 


kinase signalling to activate the transcription factor STAT3, a pathway 
that synergizes with NF-«B in promoting cell survival’"®. The onco- 
genic aetiology of this JAK-STATS signalling has not been elucidated. 

We conducted an RNA interference (RNAi) screen for genes that 
are required for proliferation and survival of lymphoma cell lines and 
identified three small hairpin RNAs (shRNAs) targeting MYD88 that 
were toxic to two ABC DLBCL lines but not to two GCB DLBCL lines 
(Supplementary Fig. 1a). During normal immune responses, MYD88 
functions as a signalling adaptor protein that activates the NF-«B 
pathway after stimulation of toll-like receptors (TLRs) and receptors 
for IL-1 and IL-18 (refs 2, 3). MYD88 coordinates the assembly of a 
multi-subunit signalling complex consisting of various members of the 
IRAK family of serine-threonine kinases’. The initial RNAi screen 
also identified two shRNAs targeting IRAK1 as toxic for one or both 
of the ABC DLBCL lines, but not for GCB DLBCL lines. A subsequent 
screen identified additional MYD88 and IRAK1 shRNAs that were 
toxic to all five ABC DLBCL lines tested but had little effect on GCB 
DLBCL, Burkitt’s lymphoma, mantle cell lymphoma and multiple 
myeloma lines (Supplementary Fig. 1a). Using shRNAs targeting the 
3’ untranslated regions of MYD88 and IRAK1, which reduced expres- 
sion of their respective proteins (Supplementary Fig. 1c), we showed 
that ABC DLBCL cells could be rescued from shRNA-mediated toxicity 
by coexpression of coding region cDNAs (IRAK1, Supplementary Fig. 
1d; MYD88, see below). MYD88 and IRAK1 shRNAs displayed a time- 
dependent toxicity for ABC DLBCL lines and induced apoptosis, but 
had little effect on GCB DLBCL and myeloma lines (Fig. 1 and Sup- 
plementary Fig. 1b, e). Together these data establish that MYD88 and 
IRAK1 are required to maintain the viability of ABC DLBCL cells. 

To comprehensively discover somatic mutations in ABC DLBCL, we 
used high-throughput resequencing of mRNA to search for sequence 
variants in four ABC DLBCL lines. In addition to known mutations in 
CARD11 and CD79B, we identified a single nucleotide variant that 
changed a leucine residue at position 265 of the MYD88 coding region 
to proline (L265P) in all four ABC DLBCL lines tested. This variant 
resides in the MYD88 TIR domain, which interacts with TIR domains 
of various receptors during innate immune responses and also mediates 
homotypic interactions’*"’. 

To extend this finding, we resequenced the MYD88 coding region in 
382 lymphoma biopsy samples. The L265P mutation was by far the most 
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Figure 1 | MYD88 is required for survival of ABC DLBCL cells. MYD88 and 
IRAK1 shRNAs have selective toxicity for ABC DLBCL lines. Shown is the 
fraction of GFP*, shRNA-expressing cells relative to the GFP, shRNA- 
negative fraction at the indicated times, normalized to the day 0 values. 


common variant observed, occurring in 29% of ABC DLBCL biopsies. 
By contrast, this mutation was rare or absent among DLBCLs of the GCB 
and PMBL subtypes and among Burkitt’s lymphomas (Fig. 2b). Of note, 
MYD88 L.265P was also observed in 9% of gastric mucosa-associated 
lymphoid tissue (MALT) lymphomas. Most MYD88 L265P mutations 
appeared heterozygous by sequencing, but six biopsy samples and one 
ABC DLBCL line (OCI-Ly3) were homozygous. By array-based com- 
parative genomic hybridization’, 56% (15 of 27) of the ABC DLBCL 
cases with gain or amplification of the MYD88 locus had the L265P 
mutation, compared to 29% (13 of 45) with wild-type MYD88 copy 
number (P = 0.023), indicating selection by the cancer cells for this 
mutant allele. A host of other, less common MYD88 mutations were 
equally distributed among ABC and GCB DLBCL cases (Fig. 2a, b). 
Whereas most mutations were in the TIR domain, one mutation 


(V52M) was in the death domain and two were between the death 
and TIR domains (S149G/I). Six ABC DLBCL lines had a MYD88 
mutation (Fig. 1), whereas all 14 GCB DLBCL lines tested were wild 
type. In 13 DLBCL cases for which matched germline DNA was 
available, the MYD88 mutations (L265P, V217F, S219C, M232T, 
S243N, T294P) were confirmed to be somatically acquired. Overall, 
MYD88 mutations were observed in 39% of ABC DLBCLs (Fig. 2b), 
establishing MYD88 as among the most frequently altered genes in 
this malignancy. 

The MYD88 mutations partially overlapped with abnormalities in 
CD79B/A, A20 and CARD11 in ABC DLBCL tumours (Fig. 2c). Among 
cases with a MYD88 L265P mutation, 34% had a coincident CD79B/A 
mutation whereas this overlap was significantly less common among 
ABC DLBCLs without a CD79B/A mutation (18%; P = 0.03). These 
data raise the possibility ofa functional interaction between the chronic 
active B-cell receptor signalling that is associated with CD79B/A muta- 
tions® and the signalling that is instigated by the MYD88 L265P muta- 
tion. Some cases had MYD88 L265P as well as a CARD11 mutation, 
which strongly activates NF-«B, suggesting that the MYD88 mutation 
confers additional biological attributes beyond NF-kB activation. 

The location of the MYD88 mutations within the three-dimensional 
structure of the MYD88 TIR domain was both surprising and instruc- 
tive (Fig. 2d). The L265P mutation occurs at a residue that is invariant 
in evolution and contributes to a B-sheet at the hydrophobic core of the 
domain. Another mutation, M232T, affects a methionine that is in an 
adjacent B-sheet and contacts the leucine affected by L265P. A cluster 
of mutations were in the ‘B-B loop’, an evolutionarily conserved region 
that mediates TIR domain interactions'®. Two other mutations, $222R 
and S243N, alter an adjoining face of the TIR domain. Only one 
mutant affects the opposite side of the TIR domain (T294P), altering 
the conserved ‘box 3’ motif that is important in IL-1 signalling”. 

To examine whether the MYD88 mutants confer a gain or loss of 
function, we performed a complementation experiment in which we 
knocked down endogenous MYD88 in ABC DLBCL lines and ectopi- 
cally expressed wild-type or mutant MYD88 coding regions. In ABC 
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Figure 2 | MYD88 mutations in human lymphomas. a, MYD88 missense 
mutations in lymphoma biopsies and cell line models of ABC DLBCL (light 
blue), GCB DLBCL (orange), MALT lymphoma (purple) and Burkitt’s 
lymphoma (BL; dark blue). Amino acid positions are shown according to 
protein accession NP_002459. b, Frequencies of MYD88 mutations in biopsy 
samples from different lymphoma subtypes. c, Overlap of MYD88 mutations 
with other recurrent genetic alterations in ABC DLBCL tumour specimens. 
Genetic subsets were defined by somatic mutations and, in the case of the A20 
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subset, by homozygous deletion or epigenetic silencing. d, Location of MYD88 
mutations in the three-dimensional structure of the MYD88 TIR domain. 

e, Dependence of ABC DLBCLs on MYD88 L265P. A 3'-UTR-directed MYD88 
shRNA was inducibly expressed in the indicated ABC DLBCL lines, which were 
stably transduced with rescue vectors expressing wild-type or L265P MYD88 
coding regions, or with an empty vector. Shown is the fraction of viable shRNA- 
expressing cells relative to the shRNA-negative fraction, normalized to day 0 
values. 
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DLBCL lines harbouring an L265P mutation, MYD88 L265P rescued 
the cells after MYD88 knockdown, but wild-type MYD88 was ineffec- 
tive (Fig. 2e), although these MYD88 isoforms were expressed equiva- 
lently (data not shown). Hence, these ABC DLBCLs are ‘addicted’ to 
the action of the L265P MYD88 mutant, indicating that it is a gain-of- 
function driver mutation that confers a selective advantage during the 
evolution of ABC DLBCL tumours. 

To assess the biochemical and functional consequences of the MYD88 
mutations, we fused green fluorescent protein (GFP) to MYD88 and 
introduced the fusion proteins into DLBCL lines. Immunoprecipitation 
of MYD88-GFP with anti-GFP antibodies brought down IRAK1 and 
IRAK4, two kinases known to associate with MYD88 upon TLR or 
IL-1 stimulation (Fig. 3a)’. During IL-1 signalling, IRAK1 becomes 
hyperphosphorylated by IRAK4, resulting in slowly migrating IRAK1 
isoforms’. In cells bearing the MYD88 L265P, a prominent, slow- 
migrating IRAK1 species co-immunoprecipitated with MYD88 
(Fig. 3a). By contrast, wild-type MYD88 did not associate strongly with 
these IRAK1 isoforms nor did the other MYD88 mutants tested 
(Fig. 3b). Treatment with A-phosphatase collapsed the slow-migrating 
IRAKI species into a single band, confirming that they are phosphory- 
lated IRAK1 isoforms (Fig. 3c). Phosphorylation of endogenous IRAK1 
was observed in an ABC DLBCL line with L265P but not in a GCB 
DLBCL line (Fig. 3c). Thus, the MYD88 L265P mutant nucleates a 
signalling complex in ABC DLBCLs that includes phosphorylated 
IRAKI, consistent with a gain-of-function phenotype. 

IRAK4 co-immunoprecipitated with MYD88, but it associated 
equivalently with wild-type and L265P MYD88 (Fig. 3a). Knockdown 
of IRAK4 was toxic for ABC DLBCL lines but not for GCB DLBCL and 
myeloma lines (Fig. 3d and Supplementary Fig. 1c). Wild-type IRAK4 
rescued ABC DLBCL lines after IRAK4 shRNA induction, but a kinase- 
dead IRAK4 isoform could not (Fig. 3e), despite equivalent expression 
(data not shown). By contrast, IRAK1 kinase activity was not required 
for the survival of ABC DLBCL cells (Supplementary Fig. 1d). A select- 
ive small-molecule inhibitor of IRAK1 and IRAK4 kinase activity’” 
killed ABC DLBCL lines but not GCB DLBCL and myeloma lines 
(Fig. 3f). Together, these findings demonstrate that ABC DLBCLs rely 
upon IRAK4 kinase activity to transduce signals from MYD88 L265P 
that promote survival. 

To investigate signalling pathways that are engaged by MYD88 
L265P, we knocked it down in an ABC DLBCL line and profiled the 
ensuing gene expression changes (Supplementary Table 1 and Sup- 
plementary Fig. 2). We identified 285 genes that were down-modulated 
after MYD88 knockdown, and searched for overlap between this 
MDY83 signature and previously defined gene expression signatures’® 
(Supplementary Table 2). The most significantly enriched signature 
reflects NF-«B signalling in ABC DLBCL (44x enrichment, 
P=24%X10 '*°). This signature was also inhibited after [RAKI 
knockdown (Supplementary Fig. 3), indicating that IRAK1 mediates 
NF-«B activation by MYD88 L265P. To compare the ability of wild- 
type and mutant MYD88 isoforms to activate NF-KB, we expressed 
them as GFP fusion proteins in a GCB DLBCL line with little endo- 
genous NF-«B activity. Whereas wild-type MYD88 activated an NF- 
«B-dependent reporter modestly, L265P had strong activity, as did 
M232T and S243N, whereas $222R and T294P had an intermediate 
effect (Fig. 4b). At all MYD88 expression levels, L265P was superior to 
wild-type MYD88 in upregulating CD83, a previously established NF- 
kB target in this system° (Fig. 4c). Other MYD88 mutants induced 
CD83 to varying degrees but all were more active than wild-type 
MYD88. Thus, mutant MYD88 isoforms can contribute to the con- 
stitutive NF-«B activation that typifies ABC DLBCL”. 

A signature of JAK kinase signalling in ABC DLBCL overlapped 
significantly with the MYD88 signature (Fig. 4a; 14x enrichment, 
P=9.6 X 10 *) and with IRAK1-regulated genes (Supplementary 
Fig. 3b). This was notable because autocrine secretion of IL-6 and 
IL-10 drives JAK-STAT3 signalling in a subset of ABC DLBCLs’. 
MYD88 knockdown significantly diminished the secretion of IL-6 and 
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Figure 3 | MYD88 mutations are gain-of-function. a, An altered IRAK1 
isoform associated with MYD88 L265P. The GCB DLBCL line BJAB was 
transduced with GFP-tagged wild-type (WT) or L265P MYD88, or with an 
empty vector (null). Anti-GFP immunoprecipitates (IP) and input lysates were 
examined by immunoblotting for IRAK1, IRAK4 and MYD88. b, Preferential 
association of an altered IRAK1 isoform with MYD88 L265P. BJAB cells were 
transduced with the indicated GFP-tagged MYD88 isoforms and examined as 
ina. c, MYD88 L265P associates with phosphorylated IRAK1. Top panel: BJAB 
cells were transduced with the indicated GFP-tagged MYD88 isoforms. Anti- 
GFP immunoprecipitates were treated with \-phosphatase as indicated and 
examined by immunoblotting for IRAK1 or MYD88. Bottom panel: anti- 
IRAK1 immunoprecipitates from HBL1 (ABC) or BJAB (GCB) cells were 
treated with A-phosphatase as indicated and examined by immunoblotting for 
IRAK1. d, Toxicity of IRAK4 shRNAs for ABC DLBCLs. The indicated lines 
were transduced with retroviruses expressing IRAK4 shRNA and the relative 
number of shRNA” cells is plotted versus time after shRNA induction, 
normalized to day 0. Data are representative of experiments with three different 
IRAK4 shRNAs. e, IRAK4 kinase activity is required for ABC DLBCL survival. 
The indicated ABC DLBCL lines were transduced with retroviruses expressing 
wild-type or kinase-dead IRAK4 isoforms, or with an empty vector (-ctrl). The 
survival of cells after induction of an IRAK4 shRNA is shown. f, A small- 
molecule IRAK1/4 kinase inhibitor is selectively lethal for ABC DLBCLs. 
Viability of the indicated lines was measured after treatment for 3 days with 
various inhibitor concentrations and normalized to DMSO-treated cells. 

g, IRAK4 kinase activity regulates IL-6 and IL-10 secretion. The indicated 
cytokines were measured in the supernatant of ABC DLBCL lines after 
treatment for 24h with various concentrations of the IRAK1/4 inhibitor. 


IL-10 as well as the phosphorylation of STAT3 in several ABC DLBCL 
lines (Fig. 4d, e and Supplementary Fig. 1f). IL-6 and IL-10 secretion was 
also blocked by the IRAK1/4 kinase inhibitor, indicating that IRAK4 
links MYD88 L265 signalling to the expression of these cytokines 
(Fig. 3g). Previous work identified a ‘“STAT3-high’ subgroup of ABC 
DLBCL tumours with autocrine IL-6/IL-10 signalling and STAT3 phos- 
phorylation, which was missing in a ‘STAT3-low’ subgroup’. The 
MYD88 1265P mutation was significantly more common in the 
STAT3-high subgroup (37%) than in the STAT3-low subgroup (13%) 
(P = 0.0036), and other MYD88 mutations were modestly enriched 
among STAT3-high cases as well (Fig. 4f). These data indicate that 
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MYD88 mutations contribute to JAK-STAT3 signalling in ABC DLBCL 
tumours. 

The MYD88 signature included a number of genes that are induced 
by type I interferon (Fig. 4a; 7 enrichment, P = 4.3 X 10 ny which is 
intriguing given that MYD88 signalling can induce interferon (IFN)-B 
production by innate immune cells. IFN-B was measurable in the super- 
natant of the OCI-Ly3 ABC DLBCL line, and MYD88 knockdown 
diminished its secretion (Fig. 4d). MYD88 knockdown decreased 
IFN-B mRNA levels in the HBL1 line (Supplementary Fig. 2), although 
IEFN-f secretion was below the detection limit. Future work should 
address whether the secretion of immunomodulatory cytokines such 
as IFN-f, IL-6 and IL-10 influences immune cells in the microenviron- 
ment of ABC DLBCL tumours. 

Given the pleiotropic action of MYD88 L265P, we investigated 
whether MYD88 signalling cooperates with B-cell receptor signalling 
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Figure 4 | MYD88 mutants activate NF-«B and cytokine signalling. a, Venn 
diagram of genes affected by MYD88 knockdown in the HBL1 ABC DLBCL 
line, grouped according to membership in gene expression signatures. 

b, MYD88 mutants constitutively activate NF-KB. BJAB cells co-expressing the 
indicated MYD88-GFP mutants and a NF-«B-driven luciferase reporter 
construct were assayed for luciferase activity, which was normalized to the 
expression levels of each MYD88-GFP isoform. a.u., arbitrary units. 

c, Correlation of MYD88 protein levels with CD83 expression. BJAB cells 
bearing GFP-tagged MYD88 isoforms were assessed for CD83 and GFP 
expression. Cells were assigned to equally sized bins based on their GFP levels, 
and the mean fluorescence intensity (m.f.i.) of CD83 in each bin was plotted. 
d, MYD88 knockdown decreases cytokine secretion in ABC DLBCL. MYD88 
or control (ctrl) shRNAs were induced in ABC DLBCL lines, and the indicated 
cytokines were measured in the supernatant over time. e, STAT3 
phosphorylation in ABC DLBCL depends on MYD88. MYD88 or control (ctrl) 
shRNAs were induced in ABC DLBCL lines, and cells were assessed by 
immunoblotting for phosphorylated STAT3 (pY-STAT3), total STAT3, 
MYD838 and f-actin. f, Preferential association of MYD88 mutant isoforms 
with the STAT3-high subgroup of ABC DLBCL tumours. See text for details. 
g, MYD88 and B-cell-receptor signalling pathways cooperate to maintain ABC 
DLBCL survival. OCI-Ly10 ABC DLBCL cells were first transduced with 
retroviruses expressing MYD88 or control shRNAs and then infected with 
retroviruses expressing CD79A, CARD11, or control shRNAs along with GFP. 
The relative viability of GFP™ cells is plotted, normalized to day 0 values. All 
error bars are s.e.m. (n = 3). 


to maintain ABC DLBCL survival. Knockdown of MYD88 enhanced 
the killing of an ABC DLBCL line with chronic active BCR signalling 
by CD79B or CARDI1 shRNAs (Fig. 4g). This finding indicates that 
MYD838 and B-cell receptor signalling provide non-redundant survival 
signals to ABC DLBCL cells, in keeping with the fact that some ABC 
DLBCL tumours harbour MYD88 L265P as well as CD79B or CARD 11 
mutations (Fig. 2c). 

Our genetic and functional data establish a new oncogenic pathway 
in lymphomagenesis. Somatically acquired MYD88 mutations in ABC 
DLBCL promote NF-«B and JAK-STATS3 signalling, which mediate 
cell survival in this lymphoma type”’’. MYD88 L265P was the most 
biologically potent mutant and was unique in its ability to coordinate a 
stable signalling complex containing phosphorylated IRAK1, which 
probably accounts for its high recurrence among lymphomas. MYD88 
L265P also genetically links MALT lymphoma with ABC DLBCL, two 
lymphoma subtypes that share other oncogenic features”**”°””, Other 
MYD88 mutations may also be drivers of lymphomagenesis given their 
recurrent nature and ability to activate NF-«B. From a therapeutic 
standpoint, the signalling complex coordinated by MYD88 L265P 
represents an enticing target. Our study also provides a genetic method 
to identify patients whose tumours may depend upon MYD88 signal- 
ling and who may therefore benefit from therapies targeting IRAK4 
alone or in combination with agents targeting the B-cell receptor®, NF- 
«kB or JAK-STAT3 pathways’. 


METHODS SUMMARY 


RNAi screens, doxycycline-inducible shRNA expression and shRNA toxicity 
assays were described previously”. RNA interference screening results are listed 
in Supplementary Table 3. The sequences of individual shRNAs described in the 
figures are given in Supplementary Table 4. Gene expression profiling data have 
been submitted to GEO under accession number GSE22900. 

Pre-treatment tumour biopsies were obtained from patients with de novo DLBCL, 
Burkitt’s lymphoma and gastric MALT lymphoma. Samples were analysed as per a 
protocol approved by the National Cancer Institute Institutional Review Board. 
Assignment of DLBCL specimens to subtypes was performed as described’. High- 
throughput RNA sequencing was accomplished using an IIlumina GAIIx instrument. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Cell lines. Cell lines were cultured in RPMI 1640 medium supplemented with 
penicillin/streptomycin and 10% fetal bovine serum or, for the OCI series of cell 
lines, Iscove’s medium with 20% fresh human plasma. Cells were maintained in a 
humidified, 5% CO, incubator at 37 °C. All cell lines were engineered to express an 
ecotropic retroviral receptor and the bacterial tetracycline repressor as previously 
described”. 

Retroviral vectors and retroviral transduction. A previously described retroviral 
vector, pRSMX”°, was used to express shRNA for the library screen. A modified 
version of this vector, pRSMX-PG, in which the puromycin selectable marker was 
fused with the green fluorescence protein (GFP), was used to co-express shRNA 
and GFP for shRNA toxicity assay. Retroviral transduction was performed by 
transfecting the retroviral vector and a mixture of helper plasmids for a mutant 
ecotropic envelope and gag and pol into 293T cells using Lipofectamine 2000 
(Invitrogen). Retroviral supernatants were harvested 48h after transfection and 
were used to transduce ecotropic receptor-expressing target cells by centrifugation 
at 2,500 r.p.m. for 1.5h in the presence of 8 1g ml’ polybrene. 

shRNA library screening. Pools of shRNA library were screened as previously 
described”. Briefly, pools of roughly 1,000 shRNA expressing retroviral vectors 
were used to transduce target cell lines. After puromycin selection, stable integrants 
were induced to express shRNA by doxycycline (20ng ml‘). Parallel uninduced 
cultures were kept as control. After 3 weeks of shRNA induction, genomic DNA 
from both uninduced and induced cultures were harvested. shRNA-associated bar 
code sequences from the genomic DNA were PCR amplified and in vitro tran- 
scribed, as described”’. The transcribed RNA products were labelled fluorescently 
with either Cy5 (induced) or Cy3 (uninduced) using the Universal Linkage 
System (Amersham Biosciences) and hybridized onto microarrays containing 
DNA oligonucleotides complementary to the bar code sequences, as described”. 
Each bar code experiment was performed in quadruplicate, and the microarray 
results for each bar code were averaged. The complete screening results are 
presented in Supplementary Table 3, which includes some data that have been 
previously published*”’. 

shRNA sequences. The shRNA sequences used in individual experiments are 
listed in Supplementary Table 4. 

shRNA toxicity and complementation assays. shRNA toxicity was assayed as 
described”*. Briefly, the pRSMX-PG vector that co-expresses shRNA and GFP was 
transduced into lymphoma and multiple myeloma cell lines. Two days after retro- 
viral transduction, doxycycline was added to induce shRNA expression. The frac- 
tion of GEP*, shRNA-expressing cells relative to the GFP ,shRNA fraction was 
monitored over various time points by flow cytometry and plotted against the 
same GFP", shRNA-expressing fraction on day 0 of doxycycline induction. The 
reduction of the GEP*, shRNA-expressing fraction at later time points indicates 
shRNA toxicity. Complementation studies were performed in the DLBCL cell line 
that harbours the MYD88 L265P mutation. HBL1 cells were transduced with 
retroviral vectors that co-express GFP and shRNA targeting the 3’ UTR of 
MYD88 (or IRAK1). shRNA-transduced cells were subsequently infected with 
retroviruses co-expressing wild-type or mutant MYD88 (or IRAK1) coding 
regions and mouse CD8 (Lyt2). The cell fraction positive for GFP and CD8 (using 
anti-mouse CD8a, BD Pharmingen) was monitored over time by flow cytometry 
as above. TMD8 and OCI-Ly3 cells were first transduced with retroviruses co- 
expressing wild-type or mutant MYD88 (or IRAK1) coding regions and mouse 
CD8 (Lyt2) and enriched for Lyt2 expression with magnetic beads. Enriched cells 
were subsequently infected with retroviral vectors that co-express GFP and an 
shRNA targeting the 3’ UTR of MYD88. The cell fraction positive for GFP and 
CD8 was monitored over time by flow cytometry. 

For the IRAK4 complementation assay, HBL1, HLY1, TMD8 and SUDHL2 
lymphoma cell lines were first retrovirally transduced with either vLyt2 empty 
vector, or vLyt2 vector expressing wild-type or kinase-dead IRAK4 (K213A/ 
K214A). The infected cells were later enriched using the CD8 microbeads method 
(Miltenyi Biotec) according to manufacturer’s protocol. The enriched cells were 
then infected with retroviral vector pRSMX-PG expressing either a control shRNA 
or an IRAK4 shRNA. Doxycycline was added 2 days after infection to induce 
shRNA expression. The fraction of GFP-positive cells was monitored over time 
by FACS analysis and the decline of GFP hence shRNA expressing cells indicates 
toxicity. 

Synergistic toxicity of MYD88 and either CARD11 or CD79A knockdown. 
OCI-Ly10 cells were first infected with either pRSMX-puro empty vector or 
pRSMX-puro vector encoding MYD88 shRNA sequence (5’-GTACCAGT 
ATTTATACCTCTA-3’). Two days after infection the two cells lines were selected 
using 1 ugml | puromycin. The selected cells were then retrovirally infected with 
pRSMX-PG encoding either a scramble control, shRNA sequence against 
CARD11 (target sequence 5'-GGGGTGTGTACCAGGCTATGA-3’) or CD79A 
(target sequence 5’-GGGGCTTCCTTAGTCATATTC-3’). Doxycycline was 


added 2 days after infection to induce shRNA expression. The fraction of GFP- 
positive cells was monitored over time by FACS analysis and the decline of GFP 
hence shRNA expressing cells indicates toxicity. 

High-throughput RNA sequencing/PCR amplification/Sanger sequencing. 
The standard Illumina pipeline for RNA-seq was used, using paired-end 75- 
base-pair runs with each sample run in one sequencing lane, yielding ~20 million 
reads per sample. Sequences were mapped back to both RefSeq and Ensemble 
transcript models using the BWA algorithm”, yielding a median resequencing 
coverage of 10. Single nucleotide variants (SNVs) were reported that deviated 
from the human reference genome sequence, were observed in both sequencing 
directions, represented >20% of the resequencing coverage at a particular base 
pair position, and were not known SNPs in the dbSNP database of NCBI. A total of 
52,160 putative SNVs was detected in the four ABC DLBCL cell lines studied. 
Sequences have been submitted to the NCBI short sequence archive under accession 
SRP003192. On the basis of the criteria above, all SNVs that are not represented in 
publically available databases of single nucleotide polymorphisms (SNPs) are listed 
in Supplementary Table 5. Except for the MYD88 mutations, other SNVs in this 
table have not been validated by independent means. 

Sanger sequencing of MYD88 was accomplished with the following primers: 
MYD88-Full-F, 5'-GACCTCTCCAGATCTCAAAAGGCAGATTCC-3' (PCR 
amplification and sequencing, exon 1); MYD88-Full-R, 5’-GCAGAAGTACAT 
GGACAGGCAGACAGATAC-3’ (PCR amplification and sequencing, exon 5); 
MYD88-Seq-E1R, 5'-TCTCTCCATGGGAGACAGGATGCTG-3’ (sequencing 
exon 1); MYD88-Seq-E2F, 5'-TGGGTAAAGAGGTAGGCACTCCCAG-3’ (sequen- 
cing exon 2); MYD88-Seq-E2R, 5’-GCCCATCTGCTTCAAACACCCATGC-3’ 
(sequencing exon 2); MYD88-Seq-E3F, 5’-AAGCCTTCCCATGGAGCTCTGAC 
CAC-3' (sequencing exon 3); MYD88-Seq-E3R, 5’-GCTAGGAGGAGATGCCC 
AGTATCTG-3’ (sequencing exon 3); MYD88-Seq-E4F, 5’-ACTAAGTTGC 
CACAGGACCTGCAGC-3’ (sequencing exon 4); MYD88-Seq-E4R, 5'-ATCCA 
GAGGCCCCACCTACACATTC-3' (sequencing exon 4); MYD88-Seq-E5F, 
5'-GTTGTTAACCCTGGGGTTGAAG-3’ (sequencing exon 5). 

For 155 cases of ABC DLBCL, analysis of CARD11, CD79B, and A20 by Sanger 
sequencing was performed as described**”’. 

A20 was declared epigenetically silenced if the expression level in a case was 

more than 2 standard deviations below the mean of other ABC DLBCL cases, 
based on previous gene expression profiling data’. Deletion of the A20 locus 
(TNFAIP3) was analysed by quantitative PCR using primers to amplify exon 3 
and exon 6, as described”’, and compared to a reference gene, CHMP4A, that is not 
subject to copy number alterations in ABC DLBCL. A20 was declared deleted if 
one or both of the A20 PCR products had an estimated copy number that was 
more than 3 standard deviations below the average of 9 normal control DNA 
samples. The following quantitative PCR primers were used: CHMP4A-F, 
5'-CTGCAAGGGAGGAGGGGTTTCATTC-3’ (qPCR control gene); CHMP4A-R, 
5'-CITTGGGTGTTCCTTCTGGCCAGTC-3’ (qPCR control gene); A20-E3F, 
5'-ACCTTTGCTGGGTCTTACATGCAG-3' (qPCR A20); A20-E3R, 5’-TAT 
GCCCACCATGGAGCTCTGTTAG-3’ (qPCR A20); A20-E6F, 5’-TGAGATC 
TACTTACCTATGGCCTTG-3’ (qPCR A20); A20-E6R, 5'-TCAGGTGGCTGA 
GGTTAAAGACAG-3' (qPCR A20). 
Expression vectors and cDNA mutagenesis. The expression vector, vLyt2- 
MYD88-EGFP, encoding carboxy terminus EGFP-tagged MYD88 was con- 
structed by three-way ligation of PCR-generated MY D88 and EGFP products into 
the pBMN-IRES-Lyt2 vector (provided by G. Nolan). The restriction site Sall was 
included in the MYD88 PCR reverse primer and the EGFP PCR forward primer to 
facilitate the ligation between MYD88 and EGFP. MYD88 or EGFP PCR products 
were generated using the following primer pairs: 5’-TAGTAGGGATCCG 
CCGCCACCATGCGACCCGACCGCGCTGA-3' (MYD88), 5’-TAGTAGGTC 
GACGGGCAGGGACAAGGCCTTGGC-3’ (MYD88), 5’-TAGTAGGTCGACA 
TGGTGAGCAAGGGCGAGGAG-3' (EGFP), 5’-TAGTAGGCGGCCGCTTA 
CTTGTACAGCTCGTCCAT-3’ (EGFP). 

The expression vector vLyt2-AU1-MYD88 encoding amino terminus AU1- 
tagged MYD88 was constructed by inserting PCR-generated MYD88 into the 
pBMN-IRES-Lyt2 vector. MYD88 PCR product was generated using the following 
primers: 5'-TAGTAGGGATCCGCCGCCACCATGGACACATACCGCTACA 
TCCGACCCGACCGCGCTGAGGCT-3’ and 5’-TAGTAGGCGGCCGCTCAG 
GGCAGGGACAAGGCCTTGGC-3’. 

IRAK1 expression vectors were similarly created by inserting PCR-generated 
IRAK1 cDNAs (from pIND-IRAK] wild type and K239A kinase-dead templates, a 
gift from X. Li) into the pBMN-IRES-Lyt2 vector, using the following PCR pri- 
mers: 5'-TAGTAGCTCGAGGCCGCCACCATGGCCGGGGGGCCGGGC-3’ 
and 5’-TAGTAGGCGGCCGCTCACTTGTCATCGTCGTCCTTGTAGTCGC 
TCTGAAATTCATCACTTTC-3’. 

IRAK4 expression vectors were generated by inserting PCR-generated IRAK4 
cDNA (from a template obtained from the Dana-Farber/Harvard Cancer Center 
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DNA Resource Core) into the pBMN-IRES-Lyt2 vector, using the following pri- 
mers: 5'-TAGTAGGGATGGGCCGCCACCATGGACACATACCGCTACATC 
AACAAACCCATAACACCATCA-3’ and 5'-TAGTAGGCGGCCGCTCAAGA 
AGCTGTCATCTCTTGCAG-3’. 

MYD88 mutants were created with the Phusion site-directed mutagenesis kit 
(New England BioLabs), using either vLyt2-MYD88-EGFP or vLyt2-AU1- 
MYD88 vector as templates. All cDNA inserts from PCR cloning and site-directed 
mutagenesis were verified by sequencing. The MYD88 mutagenesis primers used 
were the following: L265P forward P-CATCAGAAGCGACCGATCCCCATC 
AAG and L265P reverse P-GGCACCTGGAGAGAGGCTGAGTGCAAA; 
M232T forward P-AGGT'GCCGCCGGACGGTGGTGGTTGTC and M232T 
reverse P-CTTTTCGATGAGCTCACTAGCAATAGA; S243N forward P-GAT 
TACCTGCAGAACAAGGAATGTGAC and S243N reverse PPATCAGAGACA 
ACCACCACCATCCGG; T294P forward P-ACCAACCCCTGCCCCAAATCT 
TGGTTC and T294P reverse P-GTAGTCGCAGACAGTGATGAACCTCAG; 
S222R forward P-GGTCTATTGCTAGGGAGCTCATCGAAA and S222R 
reverse P-AGACACAGGTGCCAGGCAGGACATCGC. 

The IRAK4 kinase-dead mutant was generated similarly using the following 
mutagenesis primers: K213A/K214A forward P-ACTGTGGCAGTGGCGGCG 
CTTGCAGCAATG and K213A/K214A reverse P-TGTGTTATTTACGTAGC 
CTTTATATACA. 

MYD88 co-immunoprecipitation. BJAB cells were retrovirally transduced with 
various MYD88-GFP fusion constructs co-expressing a Lyt2 surface marker. Cells 
were enriched for Lyt2 expression using anti-Lyt2 magnetic beads (Invitrogen, 
114.47D) following the manufacturer’s instructions. Enriched cells were lysed at 
10’ cells per ml in RIPA buffer (0.5% Triton X-100, 0.5% deoxycholate, 0.05% SDS, 
10 mM Tris, pH 8.0, 50 mM NaCl, 10 mM EDTA, 1 mM Na3VO,, 30 mM pyro- 
phosphate, 10mM glycerophosphate, 1 mM AEBSF, 0.02 Uml' aprotinin and 
0.01% NaN3) for 10 min on ice. Lysates were cleared by centrifuging for 20 min at 
14,000g at 4°C. MYD88-GFP constructs were immunoprecipitated with washed 
anti-GFP agarose beads (Chromotek) for 30 min at 4 °C, then washed 3-4 times in 
1X RIPA buffer. For \-phosphatase treatment, the agarose beads were washed two 
additional times in 10 mM Tris, pH 8.0 with 50 mM NaCl to remove EDTA and 
phosphates inhibitors. A-phosphatase (New England Biolabs) treatment was done 
according to the manufacturer’s instructions. Reactions were quenched by the 
addition of 2X lamellae sample buffer followed by boiling. Samples were separated 
on 10% polyacrylamide gels and transferred to Immobilon-p PVDF membranes 
(Millipore) for western blot analysis. Antibodies used for immunoblotting were 
anti-IRAK1 rabbit polyclonal (Santa Cruz Biotech), anti-IRAK4 rabbit polyclonal 
(Cell Signaling Technologies) and anti-MYD88 rabbit monoclonal (Cell Signaling 
Technologies). 

NF-«B reporter assay. BJAB cells retrovirally expressing MYD88-GFP constructs 
(see above) were transduced with lentiviral particles containing a NF-«B firefly 
luciferase reporter construct by following the manufacturer’s instructions (SA 
Biosciences). Firefly luciferase activity was measured using the Dual-Luciferase 
Reporter Assay System (Promega) following the manufacturer’s instructions. 
Luminescence from equivalent amounts of lysate was read in triplicate on a 
Microtiter Plate Luminometer (Dyn-Ex Technologies). All readings were normalized 
to the mean fluorescence intensity of MYD88-GFP expression for each MYD88 
mutant as determined by FACS analysis on a FACScalibur flow cytometer (Becton 
Dickinson). 

Western blotting. Cells were lysed in lysis buffer (50 mM Tris pH 7.4, 150 mM 
NaCl, 1% Triton X-100, 1% NP-40, 2mM EDTA) supplemented with Complete 
Protease Inhibitor Cocktail Tablets (Roche) and phosphatase inhibitors (Sigma) 
for 30 min. Lysates were cleared by centrifugation at 15,000gat 4 °C for 10 min and 
protein concentrations were determined by BCA protein assay (Pierce). 80-100 pg 
of lysates were subjected to electrophoresis through a 4-12% Bis-Tris gel 
(Invitrogen) and immobilized on the nitrocellulose membranes. Proteins were 
detected using the following antibodies: MYD88, IRAK4, -actin, STAT3 and 
p-STAT3 (¥705) (Cell Signaling Technology). 

IRAK1 immunoprecipitation. Cells were lysed at 10’ cells per ml in RIPA buffer 
as described above. Lysates were pre-cleared with protein-A agarose beads (Pierce) 
before incubation with 1 1g ml’ of anti-IRAK1 polyclonal antibody (Santa Cruz 
Biotech, H-273) for 2h on ice. Protein-A agarose beads were added to lysates and 


LETTER 


rotated for 1 h at 4 °C, then washed three times with 1X RIPA buffer. A-phosphatase 
treatment was performed as described above. Samples were separated on 10% poly- 
acrylamide gels and transferred to Immobilon-p PVDF membranes (Millipore) for 
western blot analysis. 

Cytokine measurement. The culture medium of cells transduced with inducible 
shRNAs was replaced with fresh medium plus doxycycline, and the concentrations 
of IL-6, IL-10 or IFN-f in culture supernatants at the indicated times were measured 
by ELISA (R&D Systems). Alternatively, unmanipulated lymphoma cells lines were 
placed into fresh media with the addition of the IRAK1/4 inhibitor (EMD chemicals) 
and assessed for cytokines as above. The results were normalized to live cell numbers, 
and are representative of at least two independent experiments. 

Apoptosis measurements. HBL1 cells were retrovirally transduced with either 
control or MYD88-specific shRNAs, as described above. shRNA expression was 
induced with doxycline and cells were evaluated for apoptosis on 2, 3 and 4 days 
after induction. To measure apoptosis, cells were first fixed for 10 min with 1.5% 
paraformaldehyde, centrifuged and then fixed and permeabilized in cold methanol 
overnight. Methanol-fixed cells were washed three times with FACS buffer (PBS 
with 1% FBS) and stained with PE rabbit anti-active caspase 3 (BD Pharmingen) 
and Alexa Fluor 647 mouse anti-cleaved PARP (Asp 214) (BD Pharmingen) for 
20 min at room temperature in the dark, followed by an additional wash with 
FACS buffer. Stained cells were subjected to FACS analysis (FACScalibur, BD) and 
apoptotic cells were defined as double positive for both active caspase 3 and 
cleaved PARP. 

Cell viability assay by MTS. The described DLBCL and multiple myeloma cells 
lines were plated in duplicate at a density of 50,000 cells per well in 96-well plates 
along with DMSO as negative control, or different concentrations of IRAK1/4 
inhibitor (EMD Chemicals). Cell viability at 1, 2 and 3 days after drug treatment 
was assayed by adding 3-(4,5-dimethylthiazol-2-yl)-5-(3-carboxymethoxyphenyl)- 
2-(4-sulphophenyl)-2H-tetrazolium and an electron coupling reagent (phenazine 
methosulphate; Promega), incubated for 3h and measured by the amount of 
490 nm absorbance using a 96-well plate reader. The presented data were derived 
from 3 days of drug treatment. The assay was performed twice. 

MYD88 and IRAK1 signature analysis. To generate a gene expression signature 
of MYD88 signalling in ABC DLBCL, the HBLI cell line was transduced with 
retroviral vectors expressing either sh MYD88-4 or shMYD88-7. After puromycin 
selection, shRNA expression was induced for 24 or 48 h and gene expression was 
measured relative to parallel uninduced cultures using Agilent 444K oligonu- 
cleotide microarrays. A set of 284 MYD88 target genes was selected as those that 
were downregulated by 0.4 log, in =3 arrays. A signature of NF-«B signalling (NF- 
«B-10 signature; http://lymphochip.nih.gov/signaturedb/) in ABC DLBCL was 
generated by treating HBLI cells with the IkB kinase-f inhibitor MLN120B for 
2h, 3h, 4h, 6h, 8h, 12h, 16h and 24h. Genes that were downregulated >0.4 log, 
in at least four arrays with a one-sided t-test <0.01 were chosen. A signature of 
JAK signalling in ABC DLBCL (JAKUp-2 signature; http://lymphochip.nih.gov/ 
signaturedb/) was generated by treating HBL1 cells with JAK inhibitor I (5 uM; 
Calbiochem) for 2h, 4h, 6h and 8h. Genes were chosen that were decreased in 
expression by >0.4 log, at =3 time points. A signature of IFN signalling (IFN-3; 
http://lymphochip.nih.gov/signaturedb/) was curated as the union between three 
published gene expression signatures of type I interferon signalling (IFN-1, IRF3-1 
and Module-3.1 signatures; http://lymphochip.nih.gov/signaturedb/). A Fisher’s 
exact test was used to calculate the significance of the overlap between the MYD88 
signature and the other signatures. 

Similar methods were used to generate a signature of IRAK1 signalling in ABC 
DLBCL. Two ABC DLBCL cell lines, HBL1 and TMD8, were transduced with 
retroviruses expressing shIRAK1-3 or a control shRNA. After puromycin selec- 
tion, shRNA expression was induced for 24h or 48h and RNA and relative gene 
expression in shIRAK] and control shRNA-expressing cells was analysed by gene 
expression profiling as above. A signature of 350 genes was selected as those that 
were downregulated by 0.4 log, in =3 arrays. 
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Cell-type-specific replication initiation programs set 
fragility of the FRA3B fragile site 


Anne Letessier**, Gaél A. Millot’??, Stephane Koundrioukoff'?**, Anne-Marie Lachagés'?**, Nicolas Vogt!?3*, 
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Common fragile sites have long been identified by cytogeneticists 
as chromosomal regions prone to breakage upon replication 
stress’. They are increasingly recognized to be preferential targets 
for oncogene-induced DNA damage in pre-neoplastic lesions’ and 
hotspots for chromosomal rearrangements in various cancers’. 
Common fragile site instability was attributed to the fact that they 
contain sequences prone to form secondary structures that may 
impair replication fork movement, possibly leading to fork collapse 
resulting in DNA breaks*. Here we show, in contrast to this view, 
that the fragility of FRA3B—the most active common fragile site in 
human lymphocytes—does not rely on fork slowing or stalling but 
on a paucity of initiation events. Indeed, in lymphoblastoid cells, 
but not in fibroblasts, initiation events are excluded from a FRA3B 
core extending approximately 700 kilobases, which forces forks 
coming from flanking regions to cover long distances in order to 
complete replication. We also show that origins of the flanking 
regions fire in mid-S phase, leaving the site incompletely replicated 
upon fork slowing. Notably, FRA3B instability is specific to cells 
showing this particular initiation pattern. The fact that both origin 
setting”* and replication timing are highly plastic’*® in mammalian 
cells explains the tissue specificity of common fragile site instability 
we observed. Thus, we propose that common fragile sites corre- 
spond to the latest initiation-poor regions to complete replication 
in a given cell type. For historical reasons, common fragile sites 
have been essentially mapped in lymphocytes’. Therefore, common 
fragile site contribution to chromosomal rearrangements in tumours 
should be reassessed after mapping fragile sites in the cell type from 
which each tumour originates. 

Replication forks commonly face sequences intrinsically difficult to 
replicate such as natural pause sites”'®. Various observations have 
indicated that the sequence of common fragile sites may also constitute 
a challenge to fork movement. For example, it has been shown that 
replication is more delayed along common fragile sites than in the rest 
of the genome following treatment with aphidicolin'', a DNA poly- 
merase inhibitor, and that the nucleotide sequences of common fragile 
sites contain subregions with the potential to form secondary struc- 
tures*. Hence, it was suggested that helicases tend to travel uncoupled 
from polymerases in common fragile sites, giving rise to long stretches 
of single-stranded DNA (ssDNA) upon replication stress. In subre- 
gions able to adopt secondary structures, ssDNA formed ahead of 
polymerases may evolve into fork barriers that cause DNA breaks". 
However, genome-wide analyses have not confirmed that common 
fragile sites are specifically enriched in sequences prone to form sec- 
ondary structures'*"’, and a recent study of the replication dynamics 
along FRAOE did not show fork slowing along the site’*. The mechan- 
ism(s) involved in common fragile site instability thus have remained 
ill-defined. 

We focused on FRA 3B, the most active common fragile site in human 
lymphocytes. It overlaps the 1.5-Mb-long FHIT (fragile histidine triad) 


tumour suppressor gene'’*'®, wherein lies the most fragile subregion of 
the site. To elucidate the molecular basis of FRA 3B fragility, we analysed 
cells pulse-labelled successively with iododeoxyuridine (IdU) and then 
chlorodeoxyuridine (CldU), two thymidine analogues. We combined 
the DNA-combing technique with fluorescent detection of newly 
synthesized DNA and fluorescence in situ hybridization (FISH) with 
probes allowing identification of a 1.6-Mb-long region overlapping the 
FHIT gene (referred to as the FHIT locus) (see Methods, Supplementary 
Figs 1, 2 and Supplementary Table 1). This procedure permitted us to 
determine replication-fork speed, to search for potential fork stalling, 
and to map initiation and termination events along this large single- 
copy sequence (Supplementary Fig. 1). 

We studied Epstein—Barr-virus-immortalized human B lymphocytes 
(JEFF cells) (Supplementary Fig. 3), a cell type that shows a high fre- 
quency of breaks at FRA3B upon aphidicolin treatment (see later). We 
compared fork speeds along the FHIT locus and in the bulk genome in 
untreated and aphidicolin-treated cells (Fig. la). Statistical analyses 
show that differences found between treated and untreated cells are 
significant but not those between the locus and the bulk genome, regard- 
less of the growth conditions. Fork speeds seem to be very heterogeneous 
along the locus, a phenomenon previously observed in a non-fragile 
region studied by molecular combing”’. Plotting the speed of each fork 
as a function of its position along the locus shows that slow forks do not 
cluster in any particular subregion (Supplementary Fig. 4), which 
together indicates that the FHIT locus is not a mammalian equivalent 
of the slow-replicating zones described in yeast cells'*. 

Next we asked whether forks stall along the FHIT locus as they do at 
programmed pause sites in prokaryotes and yeasts". Stalling should lead 
to asymmetrical forks, namely to individual forks presenting unequal 
IdU and CldU tracks (see Methods and Supplementary Fig. 1b). 
Therefore, we calculated asymmetry as the ratio of the longest to the 
shortest track (Fig. 1b). We found that fork asymmetry increases sig- 
nificantly upon aphidicolin treatment but, again, no differences were 
found between the locus and the bulk genome regardless of the growth 
conditions. Moreover, in untreated and aphidicolin-treated cells we 
found 9 and 22 forks along the locus, respectively, that have an asym- 
metry factor greater than or equal to 1.5. The mapping of these forks 
shows that they are evenly distributed (Fig. 1c). Lastly, mapping of all the 
forks—asymmetrical or not—travelling along the locus showed that no 
accumulation in particular subregions occurs in either untreated or 
aphidicolin-treated cells (Supplementary Fig. 5). Thus, forks do not 
specifically stall along the FHIT locus. 

We then studied the distribution of initiation and termination events 
along the locus in untreated cells (Fig. 2a and Supplementary Fig. 1b). 
Notably, no initiation event maps to within an approximately 700-kb- 
long region (referred to as the FRA3B core) centred on exon 5. In 
contrast, ten and eight initiation events, respectively, take place in each 
of the approximately 500-kb-long regions flanking the core. Statistical 
analysis indicates that the deficit in initiation events in the core is 
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Figure 1 | Comparison of fork properties in the FHIT locus and in the bulk 
genome in JEFF cells. a, b, Distributions of fork speed (a) and fork asymmetry 
(b). Only non-truncated forks (see Supplementary Fig. 1) as determined by 
DNA counterstaining (see Supplementary Fig. 2) were recorded. Forks 
travelling in the locus are represented by orange circles (0 uM aphidicolin, 

n = 22; 0.6 uM aphidicolin, n = 44) and in the bulk genome by black circles 
(0 uM aphidicolin, n = 283; 0.6 UM aphidicolin, n = 216). Horizontal grey lines 
represent the medians of fork distributions. Medians and P values are indicated 
above the distributions. c, Upper panel shows FHIT gene (orange box) with its 
exons (E1 to E10). Arrows indicate position of forks with an asymmetry factor 
greater than or equal to 1.5 (0 UM aphidicolin, n = 9; 0.6 4M aphidicolin, 

n = 22). The number above some arrows indicates colocalized forks. 


significant (see Methods). Large regions devoid of initiation events have 
been recently associated with transition zones separating domains with 
different replication timings. Long-travelling forks move unidirectionally 
along these zones, from the earliest to the latest domain, leading to 
progressive establishment of replication delays’*”’. In contrast, forks 
travel in both directions within the FRA3B core (Supplementary Fig. 3) 
and termination events appear evenly distributed along the entire locus 
(Fig. 2a). Thus, in most cells, converging forks that emanate from each 
flanking region merge in the core, indicating that the FHIT locus is not a 
classical transition region. 

We also mapped the initiation and termination events along the 
FHIT locus in aphidicolin-treated cells (Fig. 2b). We observed 23 
initiation events, the distribution of which confirms the paucity of 
these events in the core. Despite a lower coverage of the locus the 
number of initiation events we observed in flanking regions seems 
comparable to that in untreated cells. Slowing replication speed thus 
triggers the recruitment of latent origins in flanking regions, as previ- 
ously observed at other loci*’’?. In addition, comparable levels of 
recruitment were observed in flanking regions and in the bulk genome 
(Supplementary Table 2). Notably, an approximately 900-kb-long 
region overlapping the core was devoid of any termination events. 
This deficit is not explained by preferential breakage of fibres bearing 
X-shaped structures in aphidicolin-treated cells because the density of 
termination remains normal in flanking regions. Instead, with com- 
plete replication of the core taking about 40h with fork speed 
decreased to 0.14 kb min’ ', terminations could occur only in cells that 
were close to completing replication when aphidicolin was added. 

These results indicate that a paucity of initiation events contributes 
to fragility, which is in line with previous reports showing that the 
frequency of chromosome rearrangements increases in yeast mutants 
showing reduced replication-origin efficiencies”. In addition, a paucity 
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Figure 2 | Mapping of initiation and termination events along the FHIT 
locus. a—d, Untreated (a) and aphidicolin-treated (b) JEFF cells and untreated 
(c) and aphidicolin-treated (d) MRC-5 cells. Upper panels show histograms 
showing the coverage of the locus per 50-kb window. Each bar represents the 
number of fibres fully overlapping the corresponding sequence. Horizontal 
grey lines represent the medians of coverage. Middle panels are as in Fig. 1c. 
Lower panels, arrows indicate the position of initiation and termination events. 
The number above some arrows indicates colocalized events. The total number 
of events observed is indicated on the left. 


of initiation events accounts for the FRA3B properties reported in the 
literature, as follows. Firstly, late replication completion, even in cells 
that go unperturbed through S phase**”*, may be due to the long dis- 
tances that forks coming from flanking regions have to cover before 
merging. The abnormally long delay to complete replication upon 
aphidicolin treatment” is easily explained by the fact that slowing 
down fork movement affects long-travelling forks more profoundly 
than forks that need to cover only short distances. Secondly, increased 
instability upon inactivation of ATR, which stabilizes stalled forks, has 
been considered to support the fork barrier hypothesis". Instead, poly- 
merases travelling across long distances may need to be stabilized, 
notably upon replication stress. Lastly, reduced fragility in cells bearing 
deletions in the FHIT gene has been attributed to removal of sequences 
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prone to form secondary structures”. Alternatively, reducing the size 
of the core may alleviate incomplete replication in proportion to core 
shortening. 

To characterize further FRA3B replication, we took advantage of a 
recent genome-wide analysis of replication timing by the Repli-Seq 
technique, which allows reconstitution of fork dynamics during the cell 
cycle’. Several human cell types were studied but information concern- 
ing common fragile sites was not exploited. Analysis of a 7-Mb-long 
region containing the FHIT locus (Fig. 3a and Supplementary Fig. 6) 
showed that the replication profiles along the locus differ notably 
between fibroblastic (BJ) and lymphoblastoid (GM06990 and H0287) 
cells. Whereas the profiles observed in lymphoblastoid cells are con- 
sistent with the results obtained in JEFF cells by combing, the plateau- 
shaped profile visible all along the FHIT locus in BJ cells indicates that 
initiation events take place within the FRA3B core in these cells. To 
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Figure 3 | Relationship between replication profile of the FHIT locus and 
FRA3B fragility in lymphoblastoid and fibroblastic cells. a, Left panel shows 
Repli-Seq analysis of the replication profiles along a 7-Mb-long region 
(chromosome 3: 56961124.5—63961124.5; assembly NCBI36 (hg18)) 
overlapping the FHIT gene (delimited by the orange lines) in lymphoblastoid 
GM06990 cells and fibroblastic BJ cells. Horizontal axis: position along the 
region. Vertical axis: density of sequence tags in each 50-kb window along the 
analysed region and for each indicated step of the cell cycle. Right panel shows 
cartoons illustrating the replication profiles along the FHIT locus (FHIT gene is 
delimited by orange lines). b, Number of breaks at FRA3B relative to the total 
number of breaks in cells treated with 0.15 uM or 0.6 uM of aphidicolin. 
Metaphases were prepared after a 3-h treatment with 200 nM nocodazole. See 
also Supplementary Table 3. 
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confirm and extend this conclusion, we used combing to study the 
replication profile of the FHIT locus in MRC-5 fibroblasts. Our results 
show that, both in untreated and aphidicolin-treated cells, initiation 
and termination events are evenly distributed all along the locus 
(Fig. 2c, d and Supplementary Fig. 7). Together, these observations 
are consistent with an increasing number of reports showing that both 
the replication timing’* and the initiation pattern®® of chromosomal 
domains evolve during differentiation but are conserved among differ- 
ent individuals in a given cell type”. 

We reasoned that if the initiation-poor core of lymphoblastoid cells 
is involved in FRA3B instability, the site should not be fragile in fibro- 
blasts. We used molecular cytogenetics to determine the frequency of 
breaks at FRA3B in aphidicolin-treated BJ, MRC-5 and JEFF cells 
(Fig. 3b, Supplementary Fig. 8 and Supplementary Table 3). Notably, 
the frequency of breaks at FRA3B is low in BJ and MRC-5 compared to 
JEFF cells. Long-travelling forks thus have a key role in FRA3B instab- 
ility. However, this feature alone is insufficient to promote fragility as 
there are far fewer common fragile sites than initiation-poor regions in 
the genome. Moreover, it has long been shown that late replication 
completion contributes to FRA3B fragility. Comparison of the rep- 
lication timing in fibroblastic and lymphoblastoid cells (Fig. 3a and 
Supplementary Fig. 6) shows that replication of the FHIT gene ends 
very late in both cell types. Thus, late completion per se is also insuf- 
ficient for fragility. These data indicate that common fragile site fra- 
gility results from the combination of late replication completion and 
paucity of initiation events. To test this prediction, we also analysed the 
Repli-Seq profiles along FRA16D, the second most active common 
fragile site in lymphocytes” (Supplementary Fig. 9a). We found that 
FRA16D shows tissue-specific initiation profiles and replication timing 
resembling those found at FRA3B. We also compared the frequency of 
breaks at FRAI6D in aphidicolin-treated JEFF, MRC-5 and BJ cells 
(Supplementary Fig. 9b and Supplementary Table 4). Again the results 
look like those obtained for FRA3B, which reinforces and generalizes 
the mechanism we propose. 

Tissue-specific reshuffling of the replication program provides a 
straightforward explanation for the setting of common fragile sites in 
different cell types, which would be very difficult to explain if nucleo- 
tide sequences were responsible for their fragility. In addition, the fact 
that cell-type-specific initiation profiles are maintained across species 
in regions of conserved synteny” agrees with the observed conser- 
vation of common fragile sites in different mammalian species'’. 
Thus, we propose that common fragile sites are epigenetically defined 
loci that correspond to the latest initiation-poor regions to complete 
replication in a given cell type. The paucity of initiation events might 
reflect either a cell-type-specific lack of licensed origins” or chromatin 
organization delaying firing so much that cells enter mitosis before it 
occurs. As previously shown in yeast, the mammalian checkpoint 
could be unable to delay mitotic onset when only a few long-travelling 
forks remain at work, which would allow unscheduled condensation 


of incompletely replicated common fragile sites, resulting in DNA 
breaks. 


METHODS SUMMARY 


For combing experiments, JEFF or MRC-5 cells were grown for 5 h with or without 
0.6 tM aphidicolin. Neo-synthesized DNA was labelled with a pulse of IdU fol- 
lowed bya pulse of CldU. After DNA combing, Morse-code probes were hybridized 
and fibres bearing replication and FISH signals were analysed. The replication fork 
speed variable was calculated using (dj + dc) / (ti + tex) with d; and t; being the 
measured distance (in kb) and labelling time (in min) for IdU incorporation, 
respectively, and dq and tq the corresponding parameters for CldU incorporation. 
The fork asymmetry variable corresponded to max(d;/ dq, dci/ dy), which varies 
between 1 (no asymmetry) and + (theoretical maximal asymmetry). Analysis of 
Repli-Seq data was carried out as described’. Breakage at common fragile sites was 
determined by FISH on metaphases with FRA3B or FRAI6D bacterial artificial 
chromosome (BAC) probes. The total number of breaks was counted after Giemsa 
staining of metaphase spreads (see Methods). 
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Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Cell culture. Epstein-Barr-virus-immortalized human B lymphocytes (JEFF cells) 
were grown in RPMI-1640 plus GlutaMAX-I medium (GIBCO). MRC-5 and BJ cells 
were grown in MEM plus Earle’s salts without L-glutamine medium (GIBCO), 1% 
MEM nonessential amino acids (GIBCO), 1 mM sodium pyruvate (GIBCO) and 
2mM L-glutamine (GIBCO). In addition, all cells were grown with 10% fetal calf 
serum (PAN-Biotech GmbH), 100 pg ml ' of penicillin and streptomycin (GIBCO). 
FISH on metaphases and Giemsa staining. JEFF or MRC-5 cells were grown for 
16h with aphidicolin 0.15 uM or 0.6 LM before metaphase preparations. Bacterial 
artificial chromosomes (BAC) were selected from the human March 2006 
(NCBI36 (hg18)) assembly using the UCSC Genome Browser. FRA3B was probed 
with BAC RP11-32J15 (AC104161), RP11-641C17 (AC104164) and RP11- 
147N17 (AC104300). FRA16D was probed with BAC RP11-105F24 (AC106741) 
and RP11-57106 (AC092724). Bacterial strains containing BAC were spread on 
LB agar plates containing 12.5 ,1gml' chloramphenicol and grown overnight at 
37°C. BAC DNA was then extracted according to the manufacturer’s instructions 
(NucleoBond Xtra Midi Plus; Macherey-Nagel). Probes were biotinylated using the 
BioPrime DNA labelling system (Invitrogen) and purified on Illustra ProbeQuant 
G-50 Micro Columns (GE Healthcare). FISH on metaphases was performed as 
described™* using 150ng of each FRA3B BAC probe or FRAI6D probe. 
Immunodetection was performed by successive incubations in the following reagents: 
(1) Alexa-488-conjugated streptavidin (Invitrogen); (2) biotin-conjugated rabbit 
anti-streptavidin (Rockland); (3) Alexa-488-conjugated streptavidin (Invitrogen). 
Chromosomes were counterstained with 4’,6-diamidino-2-phenylindole (DAPI) 
(Vectashield mounting medium for fluorescence with DAPI; Vector Laboratories) 
and metaphases were observed by fluorescence microscopy. Total breaks, gaps or 
constrictions on chromosomes were observed on metaphase spreads stained by 
Giemsa (Giemsa R solution; Reactif RAL) without pre-treatment to obtain a homo- 
geneous staining of chromosomes. 

DNA preparation for molecular combing. JEFF or MRC-5 cells were grown for 
5h with or without aphidicolin 0.6 uM. Neo-synthesized DNA was labelled as 
described” with the following changes: cells were pulse labelled for 30 min with 
IdU and for 30 min with CldU when grown in regular medium, or 2h with IdU 
and 2 h with CldU when grown in the presence of aphidicolin. Genomic DNA was 
extracted and combing was performed as described***'. 

FISH on combed DNA and immunofluorescence detection of neo-synthesized 
DNA. Morse-code probes were designed as described’’. Primer pairs and template 
BAC or fosmid used for PCR are listed in Supplementary Table 1. BAC and fosmid 
DNA were prepared as described earlier for FISH on metaphases. Hybridization was 
performed as described”’. Immunodetection was performed by successive incuba- 
tions in the following reagents: (1) Alexa-488-conjugated streptavidin (Invitrogen); 
(2) biotin-conjugated rabbit anti-streptavidin (Rockland); (3) Alexa-488-conjugated 
streptavidin (Invitrogen), mouse anti-bromodeoxyuridine (BrdU) (BD Biosciences) 
and rat anti-BrdU (AbD Serotec); (4) biotin-conjugated rabbit anti-streptavidin 
(Rockland), Alexa-350-conjugated goat anti-mouse (Invitrogen) and Texas-Red- 
conjugated donkey anti-rat (Jackson ImmunoResearch); (5) Alexa-488-conjugated 
streptavidin (Invitrogen), Alexa-350-conjugated donkey anti-goat (Invitrogen) and 
mouse anti-ssDNA (Millipore); (6) Cy5.5-conjugated goat anti-mouse (Abcam); (7) 
Cy5.5-conjugated donkey anti-goat (Abcam). Antibody incubations, washes and 
slide mounting were performed as reported previously’. 

Image acquisition and treatment. An Eclipse 90i (Nikon) epifluorescence micro- 
scope connected to a CoolSNAP HQ CCD camera and run by Metamorph soft- 
ware (Molecular Devices) was used for image acquisition. A X 100 objective was 
used for images of metaphases. A X60 objective was used for combed DNA images 
of the FHIT locus and a X 40 objective for those of the bulk genome. For the locus, 
two overlays of images were set up for each microscope field. The first one com- 
bined the IdU/CldU and FISH signals to identify replicating fibres of interest. The 
second overlay combined the IdU/FISH and DNA signals to determine the length 
of the DNA fibre bearing the Morse code. Image analyses were performed with 
Photoshop and Illustrator (Adobe). Fork positioning along the locus corresponds 
to the barycentre of the IdU and CldU tracks: (d; + dey) /2, with d; and dq being 
the lengths (in kb) of the two tracks of the same fork. 

Repli-Seq analysis. Analysis of Repli-Seq data was carried out as described’. 
Statistical analysis. The R environment was used for all analyses*’. Tables, gra- 
phical analyses and R code are available on request. Statistical significances were 
set to P= 0.05 and f < 0.1. Two-tailed tests were systematically used. Type I and II 
errors were not controlled by any procedure of correction. The replication fork 
speed variable was calculated using (dj + dq) / (t; + tcy), with d; and ft; being the 
measured distance (in kb) and labelling time (in min) for IdU incorporation, 
respectively, and dq and tq the corresponding parameters for CldU incorpora- 
tion. The fork asymmetry variable corresponded to max(d;/ dc, dcy/ d;), which 
varies between 1 (no asymmetry) and +° (theoretical maximal asymmetry). 


Distribution of these two variables in each class was examined using histogram 
and quantile-quantile plot. In Fig. 1, data were compatible with the Mann- 
Whitney—Wilcoxon test. Normal approximation due to identical values and con- 
tinuity correction were applied. 

In Fig. 1, for non-significant P values, the 10% probability of being wrong when 
considering that the medians are not different is related to: (1) 4 = 0.5 kb min! 
without treatment and 4 = 0.03 kb min‘! in the presence of aphidicolin (Fig. 1a); 
and (2) A = 0.62 without treatment and 4 = 0.80 in the presence of aphidicolin 
(Fig. 1b). 4 corresponds to the central parameter difference in the statistical popu- 
lations verifying § = 0.1 (Genome = Hrrazp/ruit + A, being the central parameter 
in the population). It gives an estimation of confidence when accepting the Ho null 
hypothesis that there is no difference between the two central parameters (means/ 
medians). Let us consider as an example the replication-fork speed in the absence of 
aphidicolin treatment: if the true difference between the FRA3B/FHIT locus median 
and the genome median is a minimum of 0.5kb min’, then the probability of 
being wrong when accepting Hp is a maximum of 10%. In other words, a difference 
of 0.5 kb min ' or less is considered to be non-relevant in this example when Hp is 
accepted. The procedure for getting 4 for the replication-fork-speed variable 
between the two ‘FRA3B/FHIT and ‘genome’ classes in the absence of aphidicolin 
is detailed as follows: the two distributions observed were approximately Gaussian, 
and thus were supposed to be the true distributions in the two statistical popula- 
tions, except that the mean for ‘genome’ was set to [zra3s/rurr + 4. Using R, two 
random samples of npgass/rurr = 22 and NGenome = 283 were generated, and the W 
statistic calculated. This procedure was repeated 5,000 times to get a W distribution. 
A f value was then obtained by determining the proportion of calculated W inside 
the (Wo.025, Wo.97s) interval, set by the W distribution when Hh is true (4 = 0). This 
method was applied for increasing values of 4 starting from zero. The whole 
procedure was repeated ten times and means of (3 were calculated for each 4. In 
all cases, the variation coefficient never exceeded 10%. Lastly, the 4 minimal value 
that verifies / =< 0.1 was kept. The same procedure was applied for the fork asym- 
metry variable, except that observed distributions were approximately exponential 
(the fitted probability density was 1/(m — 1) x e &~ 1°"), m being the mean of 
the observed distribution). Note that for replication-fork-speed data, a Welch test 
was also used (distributions approximately normal) instead of the Mann-Whitney- 
Wilcoxon test. However, there was no consequence on the P-value result because of 
sample sizes. 

The procedure we used to calculate that the deficit in initiation events in the core 
is significant both in untreated and aphidicolin-treated JEFF cells (Fig. 2a, b) is 
detailed as follows in the case of the experiment with untreated cells. Let us 
consider that the region without initiation events extends from the b5 to the d2 
probe. The probability for one random initiation event occurring within the b5-d2 
region could be defined as the size of this region divided by the size of the FHIT 
gene (from the beginning of El to the end of E10), which corresponds to 
674,643 bp / 1,502,095 bp = 0.45. However, this calculation does not take into 
account the coverage of the b5-d2 region and of the rest of the gene by the number 
of fibres analysed. Therefore, another procedure was chosen to calculate this 
probability. First, the size of the DNA segment harbouring an initiation event 
was defined as the distance in kilobases between the two divergent forks (fork 
sizes included). Thus, 13 measurements corresponding to the 13 initiation events 
in the FHIT gene (from the beginning of El to the end of E10) were obtained 
(range 13-526 kb). The lower value of this range gives an estimation of the min- 
imal size of a DNA segment likely to contain a detectable initiation event. As the 
13-kb value is considered low, the first quartile (102 kb) was taken as a threshold 
value below which a fibre is unlikely to carry such an event. Next, for each fibre, the 
part overlapping the b5-d2 region was measured and all the values over the 102-kb 
threshold were added (10,705 kb). The same was done for each fibre covering the 
rest of the gene (E1-b5 or d2-E10 areas, giving 12,555 kb). Thus, the probability of 
one random initiation event occurring in the FHIT gene localized inside the b5-d2 
region was defined as 10,705 / (10,705 + 12,555) = 0.46. Last, the probability of 
getting n events or less inside the b5-d2 region when 13 events occur randomly 
in the FHIT gene follows the binomial law B(13, 0.46). For zero events, 
P(X =0)=3%X10 *. The same procedure was applied for aphidicolin-treated 
cells. The threshold obtained was 17.6 kb, the binomial law was B(18, 0.36) and 
P(X <2) = 0.02. 
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MMSET regulates histone H4K20 methylation and 
o3BP1 accumulation at DNA damage sites 
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p53-binding protein 1 (53BP1) is known to be an important medi- 
ator of the DNA damage response’, with dimethylation of histone 
H4 lysine 20 (H4K20mez2) critical to the recruitment of 53BP1 to 
double-strand breaks (DSBs)”*. However, it is not clear how 53BP1 
is specifically targeted to the sites of DNA damage, as the overall 
level of H4K20me2 does not seem to increase following DNA 
damage. It has been proposed that DNA breaks may cause exposure 
of methylated H4K20 previously buried within the chromosome; 
however, experimental evidence for such a model is lacking. Here we 
found that H4K20 methylation actually increases locally upon the 
induction of DSBs and that methylation of H4K20 at DSBs is 
mediated by the histone methyltransferase MMSET (also known 
as NSD2 or WHSC1) in mammals. Downregulation of MMSET 
significantly decreases H4K20 methylation at DSBs and the sub- 
sequent accumulation of 53BP1. Furthermore, we found that the 
recruitment of MMSET to DSBs requires the YH2AX-MDCI path- 
way; specifically, the interaction between the MDC1 BRCT domain 
and phosphorylated Ser 102 of MMSET. Thus, we propose that a 
pathway involving yH2AX-MDC1-MMSET regulates the induc- 
tion of H4K20 methylation on histones around DSBs, which, in 
turn, facilitates 53BP1 recruitment. 

In response to DNA damage, 53BP1 rapidly relocalizes to the sites of 
DNA lesions in a phospho-H2AX (yH2AX)- and MDC1-dependent 
manner*’. 53BP1 is also recruited to the sites of DNA damage through 
a second mechanism that involves the binding of the tandem tudor 
domains of 53BP1 to methylated histones, with dimethylated H4K20 
(H4K20me2) being the known physiological binding site for both 
mammalian 53BP1 and its yeast homologue Crb2”’. However, unlike 
H2AX phosphorylation, no increase in the total levels of H4K20me2 
was observed after DNA damage””. It is also not clear whether 53BP1 
damage recruitment regulated by H2AX phosphorylation and H4K20 
methylation are separate pathways or if they are interconnected. 
Studies from Schizosaccharomyces pombe showed that disruption of 
both H4K20 methylation and H2AX phosphorylation does not cause 
synergistic or additive effects on the DNA damage response, indicating 
that they might function in the same pathway’. 

We examined H4K20 methylation at the sites of DNA damage using 
the cellular system (a HeLa clone carrying the DR-GFP homologous 
recombination reporter), in which expression of exogenous I-Scel intro- 
duces a single DSB in the cell’s genome’. After I-Scel induction of DSBs, 
chromatin was immunoprecipitated from the cells using antibodies 
directed against mono-, di- or trimethylated H4K20 (H4K20mel1/2/3), 
and quantitative polymerase chain reaction (qPCR) was used to deter- 
mine the relative abundance of H4K20mel1/2/3 at the induced break 
sites, while standard PCR gave a visual representation of the relative 
accumulation of these proteins at the DSB sites. Interestingly, 
H4K20mel1/2/3 at the I-Scel break site all increased after DSB induction 
(Fig. la, b), as did the H4K20mez2 signal at the sites of DNA damage 
induced by laser irradiation (Fig. 1c). Consistent with previous reports, 


we did not observe apparent increase in total H4K20me2 levels’ by 
western blot at commonly used ionizing radiation doses (Supplemen- 
tary Fig. 1a), but we did observe a notable increase following high doses 
of ionizing radiation. This indicates that local increases of H4K20me2 at 
DSBs induced by low doses of ionizing radiation are masked from 
detection by western blotting owing to the high basal levels of 
H4K20me2 occurring throughout the genome. 

Next we investigated how the increase of H4K20me2/3 was induced 
at DSBs. It has been proposed that SET8 is mainly responsible for 
H4K20mel'°”, which is required for subsequent di- and trimethyla- 
tion of H4K20. SUV420H1 and SUV420H2 are the major enzymes 
responsible for H4K20me2 and H4K20me3, respectively'*’. However, 
despite SUV420H1/2 loss and the subsequent lack of most H4K20me2/ 
3, 53BP1 accumulation at DSBs was not abolished and only slightly 
delayed’. We did not observe substantial accumulation of SUV420H1 
at the DSBs, whereas small amounts of SETS8 localized to the I-Scel site 
both before and after DNA cleavage (Fig. 1d, e). This indicates that 
other histone methyltransferases methylate H4K20 specifically at 
DSBs. Interestingly, we found that MMSET, a newly identified histone 
methyltransferase'*"'°, accumulated at DSBs (Fig. 1d, e and Sup- 
plementary Fig. 1b). 

Consistent with the results obtained from chromatin immunopreci- 
pitation (ChIP) assays, MMSET formed discrete foci after ionizing radi- 
ation, colocalizing with 53BP1 (Fig. 1f). MMSET has been shown to 
methylate H3K36, H3K27 and H4K20'*"*, and misregulation of 
MMSET due to haploinsufficiency in Wolf-Hirschhorn syndrome’ 
and by t(4;14) chromosome translocation in multiple myeloma'*”” indi- 
cates that it has an important role in the pathogenesis of these diseases. 
However, the cellular function of MMSET is largely uncharacterized. 
Our results imply that MMSET has a role in the DNA damage response 
(Fig. 1d-f). In support of this, downregulation of MMSET resulted in 
cellular hypersensitivity to ionizing radiation (Supplementary Fig. 1c). 

We proposed that MMSET regulates the DNA damage response 
through H4K20 methylation at DSBs. As shown in Fig. 2a, b and 
Supplementary Fig. 2a, b, downregulation of MMSET significantly 
decreased H4K20me2/3 at DSBs, but did not significantly affect 
H4K20mel or H3K36 methylation at DSBs. We reasoned that if 
MMSET regulates H4K20me2 at DSBs, then MMSET should regulate 
the recruitment of 53BP1 to DSBs. Indeed, we found that downregula- 
tion of MMSET significantly decreased DNA-damage-induced focus 
formation of 53BP1, but not yYH2AX, MDC1 or RNF8, which are 
upstream regulators of 53BP1 (Fig. 2c and Supplementary Fig. 2c, 
d). Further, 53BP1 focus formation was defective in cells overexpres- 
sing a truncated MMSET (H929)", whereas in cells expressing full- 
length MMSET (KMS11), 53BP1 focus formation was unaffected 
(Supplementary Fig. 2e). Downregulation of 53BP1 did not affect 
MMSET focus formation (Fig. 2d), indicating that MMSET is an 
upstream regulator of 53BP1. Importantly, downstream signalling 
events regulated by 53BP1, such as CHK2 phosphorylation”, were 
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Figure 1 | Induction of H4K20 methylation and 
recruitment of MMSET at DSBs. a, d, Examples 
of ChIP analysis by PCR of indicated proteins on a 
DSB induced by I-Scel, where input demonstrates 
equal amount of DNA for ChIP. b, e, qPCR of 
indicated ChIP samples, where the y-axis 
represents the relative enrichment of the indicated 
proteins compared to the IgG control (after 
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normalization with a PCR internal control to a 
locus other than the DSB; data + s.e.m.; n = 3). 
c, f, Immunofluorescence staining of U2OS cells 
(c) and 293T cells (f) after indicated treatments, 
then stained with indicated antibodies. HA, 
haemagglutinin; IR, ionizing radiation. 


Figure 2 | MMSET is required for 
H4K20me2/3 and 53BP1 
accumulation at DSBs. a, qPCR 
analysis of ChIP samples from HeLa 
DR-GFP cells transfected with 
control or MMSET shRNA, where 
the y-axis represents the relative 
enrichment of the indicated proteins 
compared to the IgG control 

(data + s.e.m.; n = 3). b, Ethidium 
bromide staining of ChIP samples 
from a analysed by PCR, where input 
demonstrates equal loading of DNA 
for PCR. c, d, Immunofluorescence 
of HCT116 cells transfected with the 
indicated siRNA or shRNA, 
irradiated (5 Gy), and stained with 
indicated antibodies. 

e, Phosphorylation (p) of CHK1/2 in 
the cell lysates from c analysed by 
immunoblotting. 
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impaired by downregulation of MMSET (Fig. 2e and Supplemen- 
tary Fig. 2f). To determine whether MMSET methyltransferase 
activity is required for these processes, we mutated the critical residue 
(F1117) required for MMSET methyltransferase activity'’. We re- 
introduced short hairpin RNA (shRNA)-resistant wild-type MMSET 
or MMSET(F1117A) to cells stably transfected with MMSET shRNA. 
As shown in Supplementary Fig. 2g, h, whereas wild-type MMSET 
restored H4K20 methylation and 53BP1 recruitment to DSBs, 
MMSET(F1117A) did not. These data indicate that MMSET methy- 
lates H4K20 at DSBs, which facilitates the subsequent accumulation of 
53BP1. 

Previous studies indicated that the accumulation of 53BP1 at sites of 
DNA damage also requires H2AX and MDC1*”. On investigation of 
this potential connection, we found that MMSET accumulation at 
DSBs was significantly reduced in cells depleted of H2AX and 
MDCI (Fig. 3a and Supplementary Fig. 3a), as was H4K20me2 and 
the accumulation of 53BP1. Further, MDC1 foci appeared earlier than 
those of MMSET, whereas MMSET foci formed earlier than those of 
53BP1 (Supplementary Fig. 3b). Thus, the accumulation of MMSET 
and the subsequent methylation of H4K20 and 53BP1 recruitment at 
DSBs seem to require H2AX and MDC1. 

Previous studies also indicate that downstream of MDC1, the E3 
ubiquitin ligase RNF8 regulates 53BP1 foci formation through its role 
in histone ubiquitination”. It is unclear whether RNF8 and MMSET 


regulate 53BP1 accumulation in parallel or in the same pathway. As 
shown in Supplementary Fig. 4a, downregulation of RNF8 did not 
affect MMSET recruitment and H4K20 methylation, although 
53BP1 recruitment was compromised. In addition, downregulation 
of MMSET had no effect on the recruitment of RNF8 to DSBs or 
the ubiquitination of H2A at DSBs (Fig. 2c, Supplementary Fig. 2c 
and 4b), indicating that RNF8 and MMSET function in distinct path- 
ways. Thus, the mechanism through which RNF8-mediated ubiquiti- 
nation events regulate 53BP1 recruitment remains to be determined. 

While investigating how the H2AX-MDCI1 pathway regulates 
MMSET accumulation at DSBs, we found that MMSET interacted 
with MDCI1 in a DNA-damage-inducible manner (Fig. 3b). The inter- 
action seemed to be specific to the MDC1 BRCT domain, as the 
BRCA1 BRCT domain and the MDC1 BRCT-domain mutant 
K1936M* were unable to interact (Fig. 3c and Supplementary Fig. 
4c)). Because BRCT domains recognize phospho-Ser/Thr motifs*”°, 
it is likely that MMSET is phosphorylated following DNA damage, 
thereby facilitating its interaction with MDC1. As shown in 
Supplementary Fig. 4d, MMSET was phosphorylated at ATM con- 
sensus SQ/TQ sites after ionizing radiation. No phospho-SQ/TQ sig- 
nal was detected in ATM-deficient MEF cells or in samples treated 
with A-phosphatase, indicating that MMSET is phosphorylated in an 
ATM-dependent manner. A previous large-scale proteomic study 
demonstrated that Ser 102 of MMSET is phosphorylated by ATM after 
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Figure 3 | Recruitment of MMSET to DSBs requires the ATM-H2AX- 
MDCI pathway. a, ChIP analysis by PCR of indicated proteins at DSBs in 
HeLa DR-GFP cells transfected with the indicated siRNA. Right panels show 
western blots of H2AX and MDC1. b, Co-immunoprecipitation of MMSET 
and MDC1 in HeLa cells before or after ionizing radiation. c, GST pull-down 
assay of MMSET using indicated GST fusion proteins. d, 293T cells treated and 
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GST-MDC1-BRCT concentration (uM) 


immunoprecipitated as indicated, then analysed with anti-pSQ/TQ antibody. 
e, 293T cells transfected with the indicated constructs were treated as indicated, 
then immunoprecipitated and immunoblotted with indicated antibodies. f, The 
interaction between GST-MDC1-BRCT and indicated peptides were measured 
by Biacore 3000. 
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DNA damage’’. As shown in Fig. 3d, mutation at $102 abolished 
ATM-dependent MMSET phosphorylation after DNA damage, indi- 
cating that $102 is the major ATM phosphorylation site of MMSET. 
Further, mutation at $102 abolished the MDC1—MMSET interaction 
(Fig. 3e), verifying that the phosphorylation of $102 is required for 
MDC1 binding. The MDC1 BRCT domain has been shown to bind 
phospho- 139 (pS}39QEY) of yH2AX, and a carboxy-terminal Y at the 
+3 position is critical for the binding specificity, although E at +2 is 
also positively selected**”*. The MMSET sequence after S102 is QEM, 
and does not contain Y at the +3 position. To confirm further the 
specificity of the MDC1-MMSET interaction, we used peptides con- 
taining either $102 or phospho-S102 to perform several assays. As 
shown in Supplementary Fig. 4e, phosphopeptides of MMSET pref- 
erentially pulled-down endogenous MDCI from cell lysates. We deter- 
mined further the binding affinity between MMSET peptides and the 
MDC1 BRCT domain using surface plasmon resonance (SPR). We 
found that the MDC1 BRCT domain preferentially bound MMSET 
phosphopeptides (Kg = 893 nM), although with a lower affinity than it 
did yH2AX peptides (Ka = 287 nM). No MDC! binding was found for 
non-phosphopeptides of MMSET (Fig. 3f). 

To investigate the functional significance of MMSET phosphoryla- 
tion, we stably transfected HeLa DR-GFP cells with MMSET shRNA, 
and reconstituted these cells with shRNA-resistant wild-type MMSET 
or the MMSET(S102A) mutant. As shown in Fig. 4a, b and 
Supplementary Fig. 5a, wild-type MMSET was recruited to DSBs, 
but the recruitment of MMSET(S102A) was defective. This indicates 
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that $102 phosphorylation and the MDC1-MMSET interaction are 
essential for MMSET accumulation at DSBs. Similarly, reconstitution 
of wild-type MMSET, but not MMSET(S102A), rescued H4K20me2/ 
3, 53BP1 accumulation at DSBs and CHK2 phosphorylation (Fig. 4a, c 
and Supplementary Fig. 5a—d). It is possible that the $102A mutation 
affects MMSET methyltransferase activity and subsequent H4K20 
methylation. However, we found that the activity of MMSET and 
MMSET(S102A) towards histone H4 is comparable before and after 
ionizing radiation, indicating that the effects described earlier caused 
by S102A mutation are not due to a decreased methyltransferase activ- 
ity (Supplementary Fig. 5e, f). 

The BRCT domain of MDC1 is required for binding to YH2AX at 
DNA damage sites, but it is unclear whether MDC1 uses this same 
domain to recruit MMSET to DSBs. We found that MDC1 formed 
oligomers (Supplementary Fig. 5g) and DNA damage increased the 
oligomerization of MDC1 (Supplementary Fig. 5h). Therefore, it is 
likely that different molecules in the MDC1 multimers bind YH2AX 
and MMSET separately at the DNA damage sites. 

Lastly, to examine how MMSET phosphorylation ultimately affects 
cellular sensitivity to DNA damage, we performed colony formation 
assays. Depletion of MMSET resulted in a significant increase in ion- 
izing radiation sensitivity (Fig. 4d), and reconstitution with wild-type 
MMSET could reverse this effect whereas MMSET(S102A) could not. 

Our studies reveal a critical role of the methyltransferase MMSET in 
regulating the assembly of 53BP1 foci at DNA lesions (Fig. 4e). We 
show that H4K20 methylation, unlike the previously held view, is 
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Figure 4 | Phosphorylation of MMSET is important for H4K20 
methylation, 53BP1 recruitment and the DNA damage response. a, HeLa 
DR-GFP cells were transfected with the indicated constructs, H4K20 
methylation and MMSET recruitment was analysed by PCR of ChIP samples. 
b, c, HCT116 cells transfected with indicated constructs were irradiated, and 
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10 min later, stained with indicated antibodies. d, Radiation sensitivity of cells 
from c was determined by colony formation (data + s.e.m.; n = 3). e, Model 
demonstrating how the MDC1-MMSET pathway regulates DNA-damage- 
induced histone H4 Lysine 20 methylation and 53BP1 foci formation. 
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induced at DSBs. We also establish a previously unrecognized link 
between the H2AX-MDC1 pathway and H4K20 methylation, and 
show that MMSET connects these two pathways. These results indi- 
cate that multiple myeloma tumours with t(4;14) translocation and 
MMSET dysregulation may have aberrant responses to DNA damage, 
which may be related to the poor prognosis observed in this subgroup 
of patients that are treated with DNA alkylating agents. 


METHODS SUMMARY 

HeLa DR-GFP and MDA-MB-231 ROSS cell lines were used for the ChIP assays, 
which were subsequently analysed by PCR or qPCR. Co-immunoprecipitation 
was used to detect protein-protein interactions in vivo and SPR was used to detect 
the affinity for the protein and peptide interaction in vitro. Transient transfection 
of short interfering RNA (siRNA) or stable downregulation by shRNA was used to 
decrease the level of specific proteins. Immunofluorescence staining was used to 
visualize protein accumulation and localization after DNA damage. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Plasmids and shRNAs. The MMSET(S102A) mutant was generated by PCR-based 
site-directed mutagenesis against full-length MMSET (pCEFL-MMSET-II). Wild- 
type MMSET or MMSET(S102A) was cloned into a pIRES2 vector containing S- and 
Flag-tag. shRNA-resistant constructs were made by introducing a silent mutation at 
the MMSET coding region (1666-1671; CTTCGG to CTGCGA). The MDC1 FHA 
and BRCT domains were cloned into the pGEX4T-1 vector for bacterial expression 
of GST fusion proteins. 

MMSET shRNA 1: 5’-GCACGCTACAACACCAAGTTT; MMSET shRNA 2: 

5'-GCACAGTCTTCGGAAGAGAGACACAATCA,; control shRNA: 5’-TTCAA 
TAAATTCTTGAGGT; MDC1 siRNA (MDC1 cDNA 58-76): UCCAGUGAA 
UCCUUGAGGUdTdT; control siRNA: UUCAAUAAAUUCUUGAGGUdTadT; 
H2AX siRNA: CAACAAGAAGACGCGAAUCAdTdT; 53BP1 siRNA: 5'-AA 
GAUACUCCUUGCC UGAUAA-3’; RNF8 siRNA: 5’-AGAAUGAGCUCC 
AAUGUAUUU-3’. 
Antibodies and cell lines. MMSET antibodies were provided by J. D. Licht or 
purchased from Abcam. Commercial antibodies used for ChIP were obtained from 
Upstate biotechnology (yH2AX mouse monoclonal), Millipore (H4, H4K20mel/ 
2/3, H2AUb), Active Motif (H4K20me2) and Novus (rabbit 53BP1). Antibodies 
against p53, pSQ/TQ, phospho-CHK2, CHK1 and phospho-CHK1 were pur- 
chased from Cell Signaling. CHK2 antibody was purchased from Millipore. 
RNFS8 antibody was purchased from Abcam. MDC1 antibodies have been previ- 
ously described®. 

293T cells were cultured in RPMI 1640 medium supplemented with 10% fetal 
bovine serum (FBS). HCT116 and U20S cells were cultured in DMEM supple- 
mented with 10% FBS. HeLa DR-GFP cell lines were cultured in DMEM supple- 
mented with 10% FBS and 2ngul’ puromycin. Mouse embryonic fibroblasts 
(MEFs) were cultured in DMEM containing 10% FBS and 5% ES. 
Immunoprecipitation, immunoblotting, and in vitro pull-down assays. We 
prepared cell lysates, performed immunoprecipitation, and immunoblotting as 
previously described”’. GST fusion proteins were bound to glutathione sepharose 
overnight at 4°C. The beads were washed with PBS twice and incubated with cell 
lysates for 3h at 4°C. Beads were then washed with NETN buffer (20 mM Tris- 
HCl, pH 8.0, 100 mM NaCl, 1 mM EDTA, 0.5% Nonidet P-40) three times, and 
proteins bound to beads were eluted by SDS sample buffer (100 °C for 12 min) and 
separated by SDS-PAGE for western blot analysis. 

Immunofluorescence staining. Cells grown on coverslips were fixed with 3% 
paraformaldehyde solution in 1X PBS containing 50mM sucrose at room tem- 
perature (22 °C) for 15 min. After permeabilization with 0.5% Triton X-100 buffer 
containing 20 mM HEPES pH 7.4, 50 mM NaCl, 3 mM MgCl, and 300 mM sucrose 
at room temperature for 5 min, cells were blocked with 5% goat serum for 1h at 
room temperature, then incubated with primary antibodies at 37 °C for 20 min. 
After washing with PBS twice, cells were incubated with FITC or rhodamine- 
conjugated secondary antibodies at 37 °C for 20 min. Nuclei were counterstained 
with 4’,6-diamidino-2-phenylindole (DAPI). After a final wash with PBS, cover- 
slips were mounted with glycerin containing paraphenylenediamine. 

ChIP. Induction of a single DSB in HeLa DR-GFP cells was performed through 
transfection of the I-Scel expression plasmid. Twenty-four hours after transfec- 
tion, about 5 X 10” cells were treated with 1% formaldehyde for 10 min at room 
temperature to crosslink proteins to DNA. Glycine (0.125 M) was added and 
incubated at room temperature for 5 min to stop the cross-linking. Cells were 
harvested and the pellets were resuspended in cell lysis buffer (5 mM PIPES 
(KOH), pH 8.0, 85mM KCl, 0.5% NP-40) containing the following protease 
inhibitors: 1 pgml~' leupeptin, 1 pg ml! aprotinin and 1 mM PMSF; and incu- 
bated for 10 min on ice. Nuclei were pelleted by centrifugation (2,200g for 5 min). 
Nuclei were then resuspended in nuclear lysis buffer (50 mM Tris, pH 8.1, 10 mM 
EDTA, 1% SDS containing the same protease inhibitors as in cell lysis buffer) and 
sonicated to shear chromatin to an average size of 0.6 kb. Once centrifuged until 
clear, the lysates were precleared overnight with salmon sperm DNA/protein-A 
agarose slurry. Twenty per cent of each supernatant was used as input control and 
processed with the cross-linking reversal step. The rest of the supernatant (about 
80% of the total) was incubated with 5 1g of the indicated antibody overnight at 
4°C with rotation. Complexes were washed four times, once in high salt buffer 
(50 mM Tris-HCl, pH 8.0, 500 mM NaCl, 0.1% SDS, 0.5% deoxycholate, 1% NP- 
40, 1mM EDTA), once in LiCl buffer (50 mM Tris-HCl, pH 8.0, 250 mM LiCl, 1% 
NP-40, 0.5% deoxycholate, 1 mM EDTA) and twice in TE buffer (10 mM Tris- 
HCl, pH 8.0, 1mM EDTA, pH 8.0). Beads were resuspended in TE containing 
50 mg ml! of RNase and incubated for 30 min. Beads were washed with water and 
elution buffer (1% SDS, 0.1 M NaHCO; ) was added for 15 min. Crosslinks were 
reversed by adding 10 pg ml’ RNase and 5M NaC! to a final concentration of 
0.3 M to the elutants and incubated in a 65 °C water bath for 4-5 h. Two volumes of 
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100% ethanol were added to the precipitate overnight at —20 °C. DNA was pelleted 
and resuspended in 100 ul of water, 2 ttl of 0.5 M EDTA, 4 ull 1 M Tris, pH 6.5, and 
1 plof 20 mg ml! Proteinase K was added and incubated for 1-2 h at 45°C. DNA 
was then purified and used in PCR reactions. 

The PCR primers for ChIP, about 220 bp away from the I-Scel cut site, were as 
follows: forward, 5’-TACAGCTCCTGGGCAACGTG-3’; reverse, 5’- TCCTGCT 
CCTGGGCTTCTCG-3’. 

Amplification was performed using the following program: 95°C for 5 min, 1 

cycle; 95 °C for 45 s, 56 °C for 30s and 72 °C for 30s, 30 cycles; 72 °C for 10 min, 1 
cycle. A total of 12.5 pl of the PCR products was applied to a 1.2% agarose gel and 
visualized by ethidium bromide staining. 
Quantitative analysis of ChIP samples. qPCR was performed ona 7500 RT-PCR 
System (Applied Biosystems) using the SYBR Green detection system with the 
following program: 95 °C for 5 min, 1 cycle; 95°C for 45s and 62°C for 45s, 40 
cycles. As an internal control for the normalization of the specific fragments 
amplified, a locus outside the region of the DSB was amplified, in this case 
FKBP5, using the input control sample as template. The internal control 
(FKBP5) primers were as follows: forward, 5'-CAGTCAAGCAATGGAAG 
AAG-3’; reverse, 5’- CCCGTGCCACCCCTCAGTGA-3’. 

After qPCR amplification, the FKBP5 input controls for untransfected (no DSB) 
and I-Scel transfected (DSB) were used to normalize the untransfected and trans- 
fected samples respectively. After normalization, the relative levels of the indicated 
proteins on a DSB were calculated by comparison of untransfected and I-Scel 
transfected samples to their respective IgG controls. All qPCR reactions were 
performed in triplicate, with the s.e.m. values calculated from at least three inde- 
pendent experiments. 

Biacore analysis. Binding was analysed in a Biacore 3000 system. The relevant 
biotinylated peptides (MMSET peptide sequences: biotin-AKLRFESQEMKG; 
pMMSET peptide sequences: biotin-AKLRFE(p)SQEMKG; H2AX_ peptide 
sequences: KKATQASQEY; yH2AX peptide sequences: biotin-KKATQApSQEY) 
were bound to an SA sensor chip (GE Healthcare). The indicated concentrations of 
bacterially expressed GST-MDC1-BRCT in HBS-EP (HEPES-buffered saline with 
EDTA and polysorbate 20; 10 mM HEPES, pH 7.4, 0.15 M NaCl, 3 mM EDTA and 
0.005% (v/v) polysorbate 20) were injected over the immobilized peptides at a flow 
rate of 80 pl min” '. Interactions between each peptide and GST-MDC1-BRCT were 
analysed and steady-state binding was determined at each concentration. 
Regeneration of the sensor chip surface between each injection was performed with 
three consecutive 5-1 injections of a solution containing 50 mM NaOH and 1M 
NaCl. 

In vitro histone methyltransferase assay. HA-MMSET and HA-53BP1 were 
expressed and purified from 293T cells with haemagglutinin (HA) tag antibody 
and subsequent HA peptide elusion. Recombinant histone 4 protein was from 
Upstate. In vitro histone methyltransferase assay was carried out according to the 
manufacturer’s instructions (SAM510: SAM Methyltransferase Assay kit, 
G-Biosciences). In brief, all proteins were dialysed against 0.1 M Tris-HCl, pH 
8.0. 20 uM HA-MMSET (or HA~-MMSET(S102A) mutant) and 20 uM H4 (or 
HA-53BP1) was used for every reaction. Absorbances at 510 nm were measured 
every 10-30s at 37°C until the increasing absorbances reached a plateau or the 
reactions were stopped by boiling in SDS buffer, their contents separated by 15% 
SDS-PAGE, and the methylation of H4 was visualized by immunoblotting with 
anti-H4K20Me2 antibodies (Upstate). 

Laser irradiation and immunofluorescence staining. A partially customized 
‘laser-scissors’ microirradiation system with an inverted microscope (Nikon, Ti- 
E), a laser ablation unit (Photonic Instruments, MicroPoint) and microscope 
automation and imaging software (Molecular Devices, MetaMorph) were used 
to introduce DNA damage in cultured cells. A 337-nm nitrogen laser (with 
1-20 Hz repetition rate, 2-6 ns pulse duration and 120 pyJ/pulse energy) transmits 
radiation through an optical fibre and a dye cell containing a solution that pro- 
duces a 551-nm dye laser. The laser microbeam is then focused by a 63X (NA 1.4) 
oil immersion microscope objective. The total laser energy delivered to each 
focused spot was set by an attenuator plate (50% transmission) and the number 
of pulses. Cells were cultured on 35-mm_ glass-bottomed dishes (MatTek 
Cultureware, P35G-15-14-C) before laser irradiation. 

Following laser irradiation, cells were fixed with 4% paraformaldehye (Electron 
Microscopy Sciences) for 10min at room temperature. Immunofluorescence 
staining was performed as previously described”®. Cells were then imaged using 
the Nikon microscope and the MetaMorph software described above. 


29. Kim, J.E., Chen, J. & Lou, Z. DBC1 is a negative regulator of SIRT1. Nature 451, 
583-586 (2008). 

30. You, Z. et al. CtIP links DNA double-strand break sensing to resection. Mol. Cell 36, 
954-969 (2009). 
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BY LAURA BONETTA 


Be: author Biochembelle’s post about 


her decision to leave a postdoc posi- 

tion struck a chord with many readers. 
She revealed an unsettling trend: a successful 
graduate student who had worked, published, 
defended her thesis and found the postdoc 
position of her dreams, only to discover some 
months later that those dreams had become a 
distant memory. The reality is that the ‘sure- 
fire home-run project isn’t working out, the 
new department and environment aren't such 
good fits after all, and there are personality 
clashes with the principal investigator. “You 
are tired, angry, bitter, depressed,’ she wrote. 
“You have turned into the ‘disgruntledoc’ that 
you swore youd never become.” 

Biochembelle, who asked to be referred to by 
the pseudonym she uses on her blog, has moved 
from one postdoc position in the United States 
to another, at a research and teaching hospital. 
She says she heard from many researchers in 
similar situations who, like her, chose to switch 
labs for various reasons. “For me, there were 
enough issues going on that I did not feel like 
I could do my best work in the environment I 
was in,’ says Biochembelle. “T don’t think it is 
ever just one thing that pushes you to the point 
where it is time to walk away.” 

Regardless of the motivation, switching labs 
is not easy and postdocs should think carefully 
about it. Sometimes, problems can be resolved 
bya frank discussion with the principal investi- 
gator, perhaps by enlisting the help ofa trusted 
mentor or adviser. But if walking away seems 
to be the best course of action, there are some 
steps that can help to ensure a smooth transi- 
tion with few, if any, negative repercussions. 

When informing their principal investi- 


es has gator of the move, postdocs should focus on 

yo te in career goals and opportunities rather than 
gy < .* BA ee ye on personality issues. They should also give 
@ . ae ee plenty of notice and find a way to either com- 


plete projects or leave them in such a state that 
they can be continued by someone else. And, 
of course, they should choose their next labs 
wisely. These measures can help postdocs to 
nourish their careers and stave off stagnation 


as they make their next moves. 


Taking the plunge wre os. 


postdoc before finding a permanent position. 


‘ : ° : According to the US National Science Founda- 
Switching to anew postdoc may be risky and challenging, aE Pea aR ME 


but it does not have to be career-threatening. biological, agricultural and environmental > 
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> life sciences had held one postdoc position, 
16% two postdoc positions, and 3.6% three or 
more. And the reasons are varied. “A lot of peo- 
ple do more than one postdoc not because the 
first postdoc did not work out but because they 
did not find a regular career position or because 
they wanted to get additional training,” says 
Thomas Gething, director of the office of post- 
doctoral affairs at the University of Washington 
Graduate School in Seattle. However, spending 
too much time as a postdoc can have negative 
consequences, especially in countries that have 
forced retirement at a certain age. In Germany, 
for example, the retirement age is 67, so time is 
precious. “If you work backwards, you need to 
bea full professor by your mid-40s,” says Jona- 
thon Howard, director of the Max Planck Insti- 
tute of Molecular Cell Biology and Genetics in 
Dresden, Germany. “We tend to hire group 
leaders in their early 30s so if you are spending 
more than 5 years total as a postdoc I would 
think you would run into problems.” 

This time constraint means that postdocs 
cannot afford to linger in unproductive posi- 
tions. “Having to switch postdoc labs may not 
be an ideal situation, but sticking it out can 
be even worse,” says Rania Sanford, assistant 
dean for postdoctoral affairs at Stanford Uni- 
versity in California. Staying in an unworkable 
postdoc to demonstrate commitment could be 
more detrimental than 
moving on, she says. 

Howard agrees. After 
completing a PhD in 
Australia, he spent less 
than a year in his first 
postdoc in Britain before 


switching to a lab where 

he felt he could be more 

productive. “Iknewafter 

about six months that I You want 
should move on” he tomake sure 
says. The negative con- OU don’t go 
sequences were mini- from one bad 
mal, he says, because it situation to 
was sucha short stint.“I another.” 


think it would be worse Rania Sanford 
to stay five years in a 


postdoc and not get anything out of it” 


TOUGH CHOICES 

But although switching labs can have career 
benefits, it is not always easy or practical to do. 
This is especially true for postdocs with visa 
concerns. Moving to a new university requires 
a lot of paperwork and there is no guaran- 
tee that a new visa will be granted in time. In 
extreme cases, leaving a lab might mean leaving 
the country. It can also run counter to a deeply 
ingrained cultural milieu. In China, “we have a 
deep belief that you have to face a problem head 
on and not give up’, explains Stanford Univer- 
sity postdoc Xiaomeng Milton Yu. “IfI were to 
say to my Chinese friends that I don't get along 
with my supervisor and want to leave the lab, 
they may see that as giving up.’ Although Yu is 


happy in his current position, he says he knows 
of a few Chinese postdocs who have left their 
labs because they were unhappy with them. But, 
he says, “I think in general foreign postdocs are 
more likely to stick it out” 

Postdoc problems often arise because of 
differing expectations between the principal 
investigator and the postdoc related to project 
focus, productivity, research style and the post- 
doc’s career-development goals. A postdoc 
might lament the research time that they need 
to sacrifice to supervise others in the lab; the 
project might not fit the postdoc’s interests or 
career goals, or it might require skills that the 
postdoc lacks. 

In many cases, such misunderstandings can 
be resolved simply by opening the lines of com- 
munication between principal investigator and 
postdoc. “Many times they are worried about the 
same thing,” says Jo Handelsman, a professor of 
molecular, cellular and developmental biology 
at Yale University in New Haven, Connecticut. 
Both might, for example, worry that the postdoc 
has not yet published any work. “The worst thing 
is leaving things to fester,’ says Handelsman. 

When direct communication isn't possible, 
a postdoc might consider reaching out to other 
colleagues — perhaps a department chair, 
another principal investigator or staff in the 
postdoc or ombudsman office. At Imperial Col- 
lege London, all chemistry postdocs are offered 
an academic mentor, separate from their princi- 
pal investigator, whom they get to know and can 
go to for “confidential advice of any sort’, says 
postdoc Nick Brooks, head of Imperial’s Chem- 
istry Postdoc Development Team. “This system 
can help to mitigate potential problems between 
postdocs and principal investigators.’ 

But talking has its limits when stark personal- 
ity clashes arise. “This is a much trickier prob- 
lem to fix than a conflict in goals, says Dorothy 
Shippen, a biochemist at Texas A&M University 
in College Station. In such cases, says Shippen, 
postdocs should think strategically about the 
decision to leave. First, she says, they should con- 
sider whether they can stay long enough to get 
something accomplished, such as a publication 
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Nick Brooks and Jonathon Howard suggest tackling issues with supervisors to avoid career problems. 


or a good letter of recommendation. But if they 
are losing respect for their principal investiga- 
tors, losing their love of science, or expecting to 
accomplish little by staying, they might need to 
look for new positions, she says. 

A lack of funding can be an even more for- 
midable obstacle to a successful postdoc expe- 
rience than disagreements with the principal 
investigator. In some places, postdocs can be 
reliant on the principal investigator for grant 
money. According to the National Science 
Foundation, in the autumn of 2006, 56% of 
science and engineering postdocs at US uni- 
versities were funded through federal research 
grants, up from 52% in 1993. Grant applications 
being denied can mean that a principal investi- 
gator cannot afford to keep all of his or her post- 
docs. And if a lab has an uncertain financial 
future, and the principal investigator is waiting 
for the results of various grant applications, a 
postdoc might find it prudent to seek a posi- 
tion that promises more grants. “That is hap- 
pening a lot right now,’ says Lynn Zechiedrich, 
a principal investigator and microbiologist at 
Baylor College of Medicine in Houston, Texas. 
But although a lack of funding can present an 
opportunity to let go of unproductive staff, 
principal investigators are often prepared to 
go the extra mile for those who are worth it. 
Zechiedrich, for instance, was ready to forgo 
her own pay rise to pay a postdoc’s salary while 
waiting for a grant to come through; another 
postdoc took a one-month furlough until the 
funding was available. “These were smart, 
hard-working postdocs, and now their results 
have helped us get more grants funded, so 
sometimes you have to get creative to maintain 
a postdoc position,” says Zechiedrich. 


EXIT PLAN 

Whatever the reason, the decision to switch 
labs should not be a purely emotional reaction, 
says Sibby Anderson-Thompkins, director of 
the office of postdoctoral affairs at the Univer- 
sity of North Carolina at Chapel Hill. “Make 
a list of pros and cons based on your expec- 
tations and career goals and on the overall 
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environment and relationships,” she says. 
“Then ask yourself if moving to another 
lab will give you more opportunities and 
more viable projects that result in papers 
or publications” 

As they weigh up their options, postdocs 
should consider talking discreetly to other 
principal investigators about joining their 
labs. It’s also a good idea to learn as much 
as possible about the principal investigator 
and lab environment from current and past 
lab members. “You want to make sure you 
don't go from one bad situation to another,’ 
says Sanford. She advises that postdocs 
find out exactly what would be expected 
of them. They should discuss the skills and 
training they have, and what they need to 
develop in the next year. “It is also impor- 
tant to understand what the lab direction is, 
what the principal investigator wants to do, 
and what the grants situation is,” she says, 
“so that there are no surprises.” 

When approaching new labs, a postdoc 
should avoid disparaging the principal 
investigator of the lab he or she is leaving. 
“You can say there were challenges, but focus 
on the lessons learned and skills gained,” 
says Anderson-Thompkins. “It is okay to say 
that you wanted to pursue other opportuni- 
ties, but you dont have to say how bad the 
lab or the principal 
investigator was.” 

If, after careful 
consideration, a 
postdoc decides to 
leave his or her lab, 
the postdoc should 
inform the current 
principal investiga- 
tor of the decision 


“The worst . promptly, and make 
thing isleaving sure the conver- 
things to sation focuses on 
fester.” professional rather 


Jo Handelsman than personal issues. 


Discussions should 
also focus on finishing existing projects or 
handing them over to other members of 
the lab. “Give them plenty of lead time and 
wrap up what is going on,” says Anderson- 
Thompkins. “That will help you leave on 
the best possible terms.” It could also mean 
that the postdoc is still able to garner a sup- 
portive letter of recommendation from the 
principal investigator in the future. And 
even if a glowing recommendation is out of 
the question, chances are that the ‘old’ prin- 
cipal investigator will be a collaborator or 
grant reviewer or a close friend to someone 
ona hiring committee. “You want to walk 
out the door with a good reputation,” says 
Anderson-Thompkins. “Don't do anything 
that will hurt your career.” m 


Laura Bonetta is a freelance writer based 
in Garrett Park, Maryland. 


TURNING POINT 


CAREERS 


Jonathan Rothberg 


Last December, Jonathan Rothberg, founder 
and chief executive of Ion Torrent, a 
biotechnology company based in Guilford, 
Connecticut, released the Personal Genome 
Machine. The US$50,000 desktop DNA 
sequencer will, he says, greatly improve access 
to genome sequencing. 


What decision was pivotal in your early career? 
I was interested in chemistry and engineering 
in high school, and did a chemical engineer- 
ing undergraduate degree at Carnegie Mellon 
University in Pittsburgh, Pennsylvania. But my 
interests in biology and cognitive psychology 
were growing, and I had to decide which to fol- 
low for a PhD. I knew I wanted a set of tools that 
would make me marketable. The explosion in 
biology from genome sequencing set me up to 
combine my interests in computers, biology and 
engineering, and have an impact in a rapidly 
emerging field. So I got a PhD in biology from 
Yale University in New Haven, Connecticut. 


What is your advice to young scientists? 
Master a number of fields. There will always 
be someone better than you at physics, maths 
or chemistry, but if you focus on mastering a 
few things you love, nobody will be better at 
that intersection. 


Who had the biggest influence on your career? 
Steve Jobs [co-founder of Apple]. I loved the 
way he was changing the world in 1984. I saw 
him give a presentation in which he said the 
most profound thing I had heard — that the 
reason he had become influential was that he 
‘just did it. I know it sounds like a Nike com- 
mercial, but it hit home that most people sim- 
ply think about things, and don't do them. 


Are you a scientist, inventor or entrepreneur? 

I would say scientist and inventor. I am not 
an academic so I don't publish very often, but 
my publications have been on the covers of 
Nature and Science. ’m an entrepreneur only 
because assembling smart people and funding 
is essential to bringing inventions to market. 
But scientific needs inspire my inventions. For 
example, my newborn son had a health scare in 
1999. The doctors had no way to tell whether 
he had an inherited disease, and I realized that 
an invention able to sequence an individual 
genome quickly would be useful. That idea 
sparked my second company, 454 Life Sciences. 
But my inventions also give me access to inter- 
esting, ground-breaking science. I cold-called 
Svante Paabo, a geneticist at the Max Planck 
Institute for Evolutionary Anthropology in 


Leipzig, Germany, and told him that I hada 
machine to help sequence the Neanderthal 
genome — which led to a collaboration. 


Is the Personal Genome Machine a turning point 
just for your career or for science in general? 

I hope it is pivotal for science in general. We 
madea semiconductor device that sees chem- 
istry in real time. A chip measures electrical 
charges during DNA replication, which lets it 
decode the sequence. It’s a connection between 
chemistry and the digital world. This means 
that the sequencing machine will one day be as 
ubiquitous and cheap as the mobile phone. 


What skills do you think will be most in 
demand in the coming decade? 

Quantitative skills — the ability to do calcu- 
lations and estimations. Biology is great, but 
you need analytical skills. It no longer helps 
simply to describe something. We need more 
people at the intersections of fields. For exam- 
ple, bioinformaticians don’t have to have a PhD 
in molecular biology, but they need enough of 
an understanding to develop an intuition about 
how systems work. 


How should would-be inventors go about 
bringing a technology to market? 

They should do the hardest experiment, the 
one that poses the biggest obstacle to success, 
first — otherwise they could find themselves 
ten years later having made little progress. 
Many people lose themselves by not ask- 
ing tough enough questions about their own 
inventions. If you can't clear the biggest hurdle, 
you are wasting everyone's time. m 


INTERVIEW BY VIRGINIA GEWIN 
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TO THE STARS 


BY KEN LIU AND SHELLY LI 


Ad Astra Community Forum 
twinkle_twinkle (moderator): Last week, 
ISS scientists announced that they’ve con- 
firmed spectral signatures of water, meth- 
ane, ozone and carbon dioxide in the 
atmosphere of the fourth planet orbit- 
ing Gliese 581. Hopefully, intelligent 
beings occupy this planet. As ISS scien- 
tists have not detected any electromagnetic 
radiation, we can assume that Gliese 581 is 
not any more technologically advanced than 
Earth. The natural next step is to establish 
contact. Marco Polo forged his way down 
the Silk Road in the late thirteenth century, 
fostering trade between Europe and Asia. 
Likewise, Ad Astra prides itself on its pio- 
neering investment in space exploration. We 
have decided to fund a private trade mission 
to Gliese 581, to be launched next year. Ad 
Astra predicts much potential for interstellar 
trade in the years to come. Estimated flight 
time is 40 years, so the maximum payload we 
can transport is 200 kilograms and nonper- 
ishable (though if anyone knows of a way to 
get cheap antimatter so we can boost the pay- 
load, feel free to mention in the comments). 
Mostly importantly, Ad Astra is seeking 
your suggestions. What should we carry to 
Gliese 581? As long as it fits the requirements 
listed above, we will be happy to hear your 
thoughts. If we use your suggestion, you may 
receive an honorarium of one-tenth of 1% 
from our profits. (Note: the right to receive 
the honorarium is not assignable or transfer- 
able by inheritance.) 

nolo_contendere (forum astronaut, 
second class): A non-assignable royalty of 
0.1%? Are you kidding? As it'll be at least 80 
years before your ship even gets back, only 
kids will take up your offer. Good luck ship- 
ping pokemons and cotton candy to meth- 
ane breathers. 

iheartlucy (forum space cadet): Haven't 
they been awash in signals from our reruns 
for decades with nothing to watch them on? 
Can we ship them TV sets and charge for 
every show, INCLUDING COMMERCIALS? 
It'll be the ultimate syndication market. 

Anon_4437 (forum guest): Is everyone 
asleep here? The first priority ought be to 
conquer them. If they haven't got to radio 
yet, we should send a platoon of Marines 
and take them down like Pizarro did in Peru. 
Then we can take whatever we want. 

iamnotneilarmstrong (forum astronaut, 
second class): 4437, check your reading 


Trade mission. 


comprehension. 200 
kilos is the limit. And NON- 
PERISHABLE. 

Anon_4437: Oh, right. How about one 
fully equipped attack drone then? 

twinkle_twinkle: 4437, grow up. This is a 
trading mission of peace. 

Anon_4437: You don’ get to the top of 
the food pyramid by being politically correct 
wimps. I’m giving you the most economi- 
cally efficient suggestion. 

twinkle_twinkle: I’ve banned 4437's IP. 
Please keep the discussion focused. 

iheartlucy: Does anybody know how to 
delete a previous post? I have a friend who 
might get me a meeting with some TV execs, 
and I wanna keep my idea for them. I copy- 
right it. I patent it. Don’t take it. 

iamnotneilarmstrong: No one cares. 
How do you know if the aliens even have 
eyes? Ora sense of humour? 

triune (forum astronaut, first class): 
What about just trading for ideas? With 
the cost of fuel, it’s uneconomical to trade 
even for diamonds. Probably best to use the 
weight allotment for a powerful transmitter, 
then we can just phone back and forth and 
share discoveries. 

iamnotneilarmstrong: Long time no see, 
triune! Where’ve you been? I like your idea, 
not the least because it cuts down travel time 
to light speed, so only 40 years round-trip. 
Whatare the first ideas we want to share with 
them? 

triune: Democracy, pacifism and a dose of 
the greatest hits from our religions. 

veryliberal (forum guest): triune, how 
is this better than Anon_4437’s very sen- 
sible suggestion? You want to brainwash 

them with opiates for 


> NATURE.COM the masses and then 
Follow Futures on take their stuff? At least 
Facebook at: Anon_4437 was honest. 
go.nature,com/mtoodm twinkle_twinkle: I’ve 
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banned veryliberal. The account seems 
to be a sock puppet for 4437. 
musings033 (forum guest): I 
agree with triune’s assessment on 
the economics of physical goods 
(also keep in mind that there is actu- 
ally an infinitesimal chance of pos- 
sessing something that another form of 
intelligence would want). As for specific 
ideas to transmit, why not just try to get to 
know each other first? The best way to do 
that, I think, is to be open and honest about 
our world, in hopes of reciprocation. We can 
compile a narrative — a BBC ‘documentary 
if you will — on Earth, our lands and oceans 
and all our beautiful organisms and ideas and 
constructions, culminating with the growth 
and maturation of a baby. Meanwhile, we'll 
also send a documentary that details our 
killing fields and memorials, our faces of 
fear and courage, our acts of barbarism and 
compassion upon one another and upon our 
planet. I don’t know how we'll show them 
everything, but if the beings cannot see, we 
will make them hear. If they cannot hear, we 
will make them feel. We are good and beau- 
tiful, and we have given Earth scars that will 
never properly heal. And so we will present to 
Gliese 581 our humanity, the purest and the 
evilest thing in our possession. That, in itself, 
is the first idea worth trading. 
twinkle_twinkle: That’s a very compli- 
cated ‘first idea’ 
triune: I fully support a gift to Gliese 581. 
The only thing I'd warn against is miscom- 
munication. Should we even make it seem as 
if we expect anything in return? 
thisisfutile888 (forum guest): Okay, mus- 
ings, I think you've logged into the wrong 
forum. Space Cakes Conversations is another 
Google button away. Seriously, ‘documenta- 
ries about a human baby and another centred 
around humanity’s thirst for blood and con- 
quest? The first idea makes me want to vomit, 
and the second will get us all killed. Why not 
load up Free Willy and The Little Mermaid 
and force a couple of death-row prisoners to 
care for them? Gliese 581 will piss delight. 
twinkle_twinkle: I’ve banned thisis- 
futile888. The future is just lost on some 
people. = 


Ken Liu is a lawyer and programmer. You can 
read more of his fiction at http://kenliu.name/ 
stories. Shelly Li does not believe in working, 
although her first novel, The Royal Hunter, is 
forthcoming from Penguin Books in 2011. To 
learn more, visit www.shelly-li.com. 
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